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Abstract 

Eugene Wigner's revolutionary vision predicted that the energy levels of large complex quan- 
tum systems exhibit a universal behavior: the statistics of energy gaps depend only on the basic 
symmetry type of the model. These universal statistics show strong correlations in the form of 
level repulsion and they seem to represent a new paradigm of point processes that are charac- 
teristically different from the Poisson statistics of independent points. 

Simplified models of Wigner's thesis have recently become mathematically accessible. For 
mean field models represented by large random matrices with independent entries, the celebrated 
Wigner-Dyson-Gaudin-Mchta (WDGM) conjecture asserts that the local eigenvalue statistics are 
universal. For invariant matrix models, the eigenvalue distributions are given by a log-gas with 
potential V and inverse temperature /3 = 1,2,4. corresponding to the orthogonal, unitary and 
symplectic ensembles. For (3 ^ {1,2,4}, there is no natural random matrix ensemble behind 
this model, but the analogue of the WDGM conjecture asserts that the local statistics are 
independent of V . 

In these lecture notes we review the recent solution to these conjectures for both invariant 
and non-invariant ensembles. We will discuss two different notions of universality in the sense 
of (i) local correlation functions and (ii) gap distributions. We will demonstrate that the local 
ergodicity of the Dyson Brownian motion is the intrinsic mechanism behind the universality. In 
particular, we review the solution of Dyson's conjecture on the local relaxation time of the Dyson 
Brownian motion. Additionally, the gap distribution requires a De Giorgi-Nash-Moser type 
Holder regularity analysis for a discrete parabolic equation with random coefficients. Related 
questions such as the local version of Wigner's semicircle law and derealization of eigenvectors 
will also be discussed. We will also explain how these results can be extended beyond the mean 
field models, especially to random band matrices. 
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1 Introduction 

1.1 The pioneering vision of Wigner 

"Perhaps I am now too courageous when I try to guess the distribution of the dis- 
tances between successive levels (of energies of heavy nuclei). Theoretically, the 
situation is quite simple if one attacks the problem in a simpleminded fashion. The 
question is simply what are the distances of the characteristic values of a symmetric 
matrix with random coefficients. " 

Eugene Wigner on the Wigner surmise, 1956 

Large complex systems often exhibit remarkably simple universal patterns as the number of 
degrees of freedom increases. The simplest example is the central limit theorem: the fluctuation 
of the sums of independent random scalars, irrespective of their distributions, follows the Gaussian 
distribution. The other cornerstone of probability theory identifies the Poisson point process as 
the universal limit of many independent point-like events in space or time. These mathematical 
descriptions assume that the original system has independent (or at least weakly dependent) con- 
stituents. What if independence is not a realistic approximation and strong correlations need to be 
modelled? Is there a universality for strongly correlated models? 

At first sight this seems an impossible task. While independence is a unique concept, correlations 
come in many different forms; a-priori there is no reason to believe that they all behave similarly. 
Nevertheless they do, according to the pioneering vision of Wigner [86] at least if they originate 
from certain physical systems and if the "right" question is asked. The actual correlated system he 
studied was the energy levels of heavy nuclei. Looking at spectral measurement data, it is obvious 
that the eigenvalue density (or density of states, as it is called in physics) heavily depends on the 
system. But Wigner asked a different question: what about the distribution of the rescaled energy 
gaps? He discovered that the difference of consecutive energy levels, after rescaling with the local 
density, shows a surprisingly universal behavior. He even predicted a universal law, given by the 
simple formula (called the Wigner surmise), 

w(Ej - Ej-i =s + ds)&Y exp ( - js 2 )ds, (1.1) 

where Ej = gEj denote the rescaling of the actual energy levels Ej by the density of states g near 
the energy Ej. This law is characteristically different from the gap distribution of the Poisson 
process which is the exponential distribution, e~ s ds. The prefactor s in (jl.ip indicates a level 
repulsion for the point process Ej, i.e. the eigenvalues are strongly correlated. 

Comparing measurement data from various experiments, Wigner's pioneering vision was that 
the energy gap distribution (jl.ip of complicated quantum systems is essentially universal; it depends 
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only on the basic symmetries of model (such as time-reversal invariance). This thesis has never been 
rigorously proved for any realistic physical system but experimental data and extensive numerics 
leave no doubt on its correctness (see [M] for an overview). 

Wigner not only predicted universality in complicated systems, but he also discovered a re- 
markably simple mathematical model for this new phenomenon: the eigenvalues of large random 
matrices. For practical purposes, Hamilton operators of quantum models are often approximated 
by large matrices that are obtained from some type of discretization of the original continuous 
model. These matrices have specific forms dictated by physical rules. Wigner's bold step was to 
neglect all details and consider the simplest random matrix whose entries are independent and 
identically distributed. The only physical property he retained was the basic symmetry class of 
the system; time reversal physical models were modelled by real symmetric matrices, while systems 
without time reversal symmetry (e.g. with magnetic fields) were modelled by complex Hermitian 
matrices. As far as the gap statistics are concerned, this simple-minded model reproduced the 
behavior of the complex quantum systems! The universal behavior extends to the joint statistics 
of several consecutive gaps which are essentially equivalent to the local correlation functions of the 
point process Ej. From mathematical point of view, a universal strongly correlated point process 
was found. The natural representatives of these universality classes are the random matrices with 
independent identically distributed Gaussian entries. These are called the Gaussian orthogonal 
ensemble (GOE) and the Gaussian unitary ensemble (GUE) in case of real symmetric and complex 
Hermitian matrices, respectively. 

Since Wigner's discovery random matrix statistics are found everywhere in physics and beyond, 
wherever nontrivial correlations prevail. Among many other applications, random matrix theory 
(RMT) is present in chaotic quantum systems in physics, in principal component analysis in statis- 
tics, in communication theory and even in number theory. In particular, the zeros of the Riemann 
zeta function on the critical line are expected to follow RMT statistics due to a spectacular result 
of Montgomery [68] . 

In retrospect, Wigner's idea should have received even more attention. For centuries, the pri- 
mary territory of probability theory was to model uncorrelated or weakly correlated systems. The 
surprising ubiquity of random matrix statistics is a strong evidence that it plays a similar fun- 
damental role for correlated systems as Gaussian distribution and Poisson point process play for 
uncorrelated systems. RMT seems to provide essentially the only universal and generally com- 
putable pattern for complicated correlated systems. 

In fact, a few years after Wigner's seminal paper |86| . Gaudin [52] has discovered another 
remarkable property of this new point process: the correlation functions have a determinantal 
structure, at least if the distributions of the matrix elements are Gaussian. The algebraic identities 
within the determinantal form opened up the route to calculations and to obtain explicit formulas 
for local correlation functions. In particular, the gap distribution for the complex Hermitian case 
is given by a Fredholm determinant involving Hermite polynomials. In fact, Hermite polynomials 
were first introduced in the context of random matrices by Mehta and Gaudin [66] earlier. Dyson 
and Mehta [23,25,65 have later extended this exact calculation to correlation functions and to 
other symmetry classes. When compared with the exact formula, the Wigner surmise ([Lip , based 
upon a simple 2x2 matrix model, turned out to be quite accurate. While the determinantal 
structure is present only in Gaussian Wigner matrices, the paradigm of local universality predicts 
that the formulas for the local eigenvalue statistics obtained in the Gaussian case hold for general 
distributions as well. 
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1.2 Physical models 

The ultimate mathematical goal is to prove Wigner's vision for a very large class of realistic quantum 
mechanical models. This is extremely hard, since the local statistics involve tracking individual 
eigenvalues in the bulk spectrum. Wigner's original model, the energy levels of heavy nuclei, is a 
strongly interacting many-body quantum system. The rigorous analysis of such model with the 
required precision is beyond the reach of current mathematics. 

A much simpler question is to neglect all interactions and to study the natural one-body quan- 
tum model, the Schrddinger operator —A + V with a potential V on M. d . The complexity comes 
from assuming that V is generic in some sense, in particular to exclude models with additional 
symmetries that may lead to non-universal eigenvalue correlations. Two well-studied examples are 
(i) the random Schrodinger operators where V = V(x) is a random field with a short range corre- 
lation, and (ii) quantum chaos models, where V is generic but fixed and the statistical ensemble 
is generated by sampling the spectrum in small spectral windows at high energies (an alternative 
formulation uses the semiclassical limit). 

Unfortunately, there are essentially no rigorous results on local spectral universality even in these 
one-body models. Random Schrodinger operators are conjectured to exhibit a metal-insulator tran- 
sition that was discovered by Anderson [4]. The high disorder regime is relatively well understood 
since the seminal work of Frohlich and Spencer [50] (an alternative proof is given by Aizenman and 
Molchanov pQ). However, in this regime the eigenfunctions are localized and thus eigenfunctions be- 
longing to neighboring eigenvalues are typically spatially separated, hence uncorrelated. Therefore, 
due to localization, the system does not have sufficient correlation to fall into the RMT universal- 
ity class; in fact the local eigenvalue statistics follow the Poisson process [67]. In contrast, in the 
low disorder regime, starting from three spatial dimension and away from the spectral edges, the 
eigenfunctions are conjectured to be delocalized (extended states conjecture). Spatially overlapping 
eigenfunctions introduce correlations among eigenvalues and it is expected that the local statistics 
are given by RMT. In the theoretical physics literature, the existence of the delocalized regime 
and its RMT statistics are considered as facts, supported both by non-rigorous arguments and 
numerics. One of the most intriguing approach is via supersymmetric (SUSY) functional integrals 
that remarkably reproduce all formulas obtained by the determinantal calculations in much more 
general setup but in a non-rigorous way due to neglecting highly oscillatory terms. The rigorous 
mathematics seriously lags behind these developments; even the existence of the delocalized regime 
is not proven, let alone detailed spectral statistics. 

Judged from the horizons of theoretical physics, rigorous mathematics does not fare much better 
in the quantum chaos models either. The grand vision is that the quantization of an integrable 
classical Hamiltonian system exhibits Poisson eigenvalue statistics and a chaotic classical system 
gives rise to RMT statistics [8]I10| . While Poisson statistics have been shown to emerge some specific 
integrable models [631176^ 179] . there is no rigorous result on the RMT statistics. Recently there has 
been a remarkable mathematical progress in quantum unique ergodicity (QUE) that predicts that 
all eigenfunctions of chaotic systems are uniformly distributed all over the space, at least in some 
macroscopic sense. For arithmetic domains QUE has been proved in [61]. For general manifolds 
much less is known, but a lower bound on the topological entropy of the support of the limiting 
densities of eigenfunctions excludes that eigenfunctions are supported only on a periodic orbit [2]. 
Very roughly, QUE can be considered as the analogue of the extended states for random Schrodinger 
operators. Theoretically, the overlap of eigenfunctions should again lead to correlations between 
neighboring eigenvalues, but their direct quantitative analysis would require a much more precise 
understanding of the eigenfunctions. 
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1.3 Random matrix ensembles 



In these lectures we consider even simpler models to test Wigner's universality hypothesis, namely 
the random matrix ensemble itself. The main goal is to show that their eigenvalues follow the local 
statistics of the Gaussian Wigner matrices which have earlier been computed explicitly by Dyson, 
Gaudin and Mehta. The statement that the local eigenvalue statistics is independent of the law 
of the matrix elements is generally referred to as the universality conjecture of random matrices 
and we will call it the Wigner-Dyson-Gaudin-Mehta conjecture. It was first formulated in Mehta's 
treatise on random matrices [64] in 1967 and has remained a key question in the subject ever since. 
The goal of these lecture notes is to review the recent progress that has led to the proof of this 
conjecture and we sketch some important ideas. We will, however, not be able to present all aspects 
of random matrices and we refer the reader to recent comprehensive books [3|I17}[T9]. 

1.3.1 Wigner ensembles 

To make the problem simpler, we restrict ourselves to either real symmetric or complex Hermitian 
matrices so that the eigenvalues are real. The standard model consists of N x N square matrices 
H = (hij) with matrix elements having mean zero and variance 1/N, i.e., 

Ehij = 0, E|^/ = i i,j = 1,2,..., N. (1.2) 

The matrix elements hij, i, j = 1, . . . , N, are real or complex independent random variables subject 
to the symmetry constraint hij = hj%. These ensembles of random matrices are called (standard) 
Wigner matrices. We will always consider the limit as the matrix size goes to infinity, i.e., N — > oo. 
Every quantity related to H depends on N, so we should have used the notation and h^ , 

etc., but for simplicity we will omit N in the notation. 

In Section [2] we will also consider generalizations of these ensembles, where we allow the matrix 
elements hij to have different distributions (but retaining independence). The main motivation 
is to depart from the mean-field character of the standard Wigner matrices, where the quantum 
transition amplitudes hij between any two sites i,j have the same statistics. The most prominent 
example is the random band matrix ensemble (see Example 12. 1|) that naturally interpolates be- 
tween standard Wigner matrices and random Schrodinger operators with a short range hopping 
mechanism (see [80] for an overview). 

The first rigorous result about the spectrum of a random matrix of this type is the famous 
Wigner semicircle law [86] which states that the empirical density of the eigenvalues, Ai, A2, . . . , Ajv, 
under the normalization (jl.2p . is given by 

1 N 1 , 

Qn(x) := - S ( x ~ A i) ^ Qsc(x) := —y/(A-x*) + (1.3) 
i=i 

in the weak limit as N — > 00. The limit density is independent of the details of the distribution of 
hij . 

The Wigner surmise (jl.ip is a much finer problem since it concerns individual eigenvalues and 
not only their behavior on macroscopic scale. To understand it, we introduce correlation functions. 
If ptv(Ai, A2, . . . , Xn) denotes the joint probability density of the (unordered) eigenvalues, then the 
ra-point correlation functions (marginals) are defined by 

Pn (Ai,A 2 ,...,A„) := / pjv(Ai,...,A n ,A n+ i,...Ajv)dA n+ i...dAjv- (1-4) 

JR N - n 
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To keep this introduction simple, we state the corresponding results in terms of the eigenvalue cor- 
relation functions for Hermitian N x N matrices. In the Gaussian case (GUE) the joint probability 
density of the eigenvalues can be expressed explicitly as 

N 

p N (Xi, A 2 , . . . , A^v) = const. J](A* - A,) 2 J] e -* NX i, (1.5) 

i<j j=l 

where the normalization constant can be computed explicitly. The Vandermonde determinant 
structure allows one to compute the fc-point correlation functions in the large N limit via Hermite 
polynomials that are the orthogonal polynomials with respect to the Gaussian weight function. 

The result of Dyson, Gaudin and Mehta asserts that for any fixed energy E in the bulk of the 
spectrum, i.e., \E\ < 2, the small scale behavior of pffi is given explicitly by 

where K is the celebrated sine kernel 

siii7r(> - y) 

K(x,y) = — — . (1.7) 

tt(x - y) 

Note that the limit in (jl .61) is independent of the energy E as long as it lies in the bulk of the 
spectrum. The rescaling by a factor N~ l of the correlation functions in (|1.6p corresponds to the 
typical distance between consecutive eigenvalues and we will refer to the law under such scaling as 
local statistics. Note that the correlation functions do not factorize, i.e. the eigenvalues are strongly 
correlated despite that the matrix elements are independent. Similar but more complicated formulas 
were obtained for symmetric matrices and also for the self-dual quaternion random matrices which 
is the third symmetry class of random matrix ensembles. 

The convergence in f 1 1 . 6 1) holds for each fixed \E\ < 2 and uniformly in (a±, . . . ,a n ) in any 
compact subset of W 1 . Fix now k compact subsets A%, . . . in BL From (II. 6p one can compute 
the distribution of the number rij of the rescaled eigenvalues A a := N(X a — E)g sc (E) in Aj around 
a fixed energy \E\ < 2. The limit of the joint probabilities 



P[#{\ a eA j }=n j , j = 1,2,..., k) (1. 



is given as derivatives of a Fredholm determinant involving the sine kernel. Clearly (|1.8p gives 
a complete local description of the rescaled eigenvalues as a point process around a fixed energy 
E. In particular it describes the distribution of the eigenvalue gap that contains a fixed energy 
E. However, (|1.8p does not determine the distribution of the gap with a fixed label, e.g. the gap 
^N/2+i ~ -V/v/2- Only the cumulative statistics of many consecutive gaps can be deduced, see [T7] 
for a precise formulation. The slight discrepancy between the statements at fixed energy and with 
fixed label leads to involved technical complications. 



1.3.2 Invariant ensembles 

The explicit formula (jl . 5f) is special for Gaussian Wigner matrices; if hij are independent but non- 
Gaussian, then no analogous explicit formula is known for the joint probability density. Gaussian 
Wigner matrices have this special property because their distribution is invariant under base trans- 
formation. The derivation of (jl.5p relies on the fact that in the diagonalization H = UAU* of H, 
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where A is diagonal and U is unitary, the distributions of U and A decouple. The Gaussian measure 
of hij with the normalization (jl.2p can also be expressed as 

exp ( - -jVTr# 2 )d# = exp ( - -N Tr A 2 ^jd(UAU*), (1.9) 



where dH is the Lebesgue measure on hermitian matrices. The Vandermonde determinant in (|1.5p 
originates from the integrating the Jacobian d(U AU*)/dA over the unitary group. Similar argument 
holds for real symmetric matrices with orthogonal conjugations, the only difference is the exponent 
2 of the Vandermonde determinant becomes 1. The exponent is 4 for the third symmetry class 
of Wigner matrices, the self-dual quaternion matrices with symmetry group being the symplectic 
matrices (Gaussian symplectic ensemble, GSE). 

Starting from (II. 5p . there are two natural generalizations of Gaussian Wigner matrices. One 
direction is the Wigner matrices with non-Gaussian but independent entries that we have already 
introduced in Section 11.3.11 Another direction is to consider a more general real function V(H) 
of H instead of the quadratic H 2 in (JO]) . Since invariance still holds, TrV(H) = TrV(UAU*) = 
Tr V(A), the same argument gives (J1.5I) . with V(Xi) instead of A|/2, for the correlation functions of 
exp(— N Tr V(H)). These are called invariant ensembles with potential V. Their matrix elements 
are in general correlated except in the Gaussian case. 

Invariant ensembles in all three symmetry classes can be given simultaneously by the probability 
measure 

where N is the size of the matrix H, V is a real valued potential and Z = is the normalization 
constant. The positive parameter (3 is determined by the symmetry class, its value is 1, 2 or 4, for 
real symmetric, complex hermitian and self-dual quaternion matrices, respectively. The Lebesgue 
measure dH is understood over the matrices in the same class. The probability distribution of the 
eigenvalues A = (Ai, . . . , Ajv) is given by the explicit formula (c.f. (jl.5p ) 



/^(A)dA ~ e-^ NH ^d\ with Hamiltonian %{X) := ^ -V(X k ) - — log(Aj - A»). 

k=l l^i<j^N 

(1.10) 

The key structural ingredient of this formula, the logarithmic interaction that gives rise to the 
the Vandermonde determinant, is the same as in the Gaussian case, (|1.5p . Thus all previous 
computations, developed for the Gaussian case, can be carried out for /3 = 1, 2, 4, provided that the 
Gaussian weight function for the orthogonal polynomials is replaced with the function e~P v ( x " 2 . 
The analysis of the correlation functions depends critically on the the asymptotic properties of the 
corresponding orthogonal polynomials. 

While the asymptotics of the Hermite polynomial for the Gaussian case are well-known, the 
extension of the necessary analysis to a general potential is a demanding task; important progress 
was made since the late 1990's by Fokas-Its-Kitaev @9], Bleher-Its [9], Deift et. al. JT7 ] [20 ] [2T]. 
Pastur-Shcherbina J71JI72] and more recently by Lubinsky J62] , These results concern the simpler 
(3 = 2 case. For (3 = 1,4, the universality was established only quite recently for analytic V with 
additional assumptions [T8 ] [T9 ] [59 ] [78] using earlier ideas of Widom [85] . The final outcome of these 
sophisticated analyses is that universality holds for the measure (|1.10p in the sense that the short 
scale behavior of the correlation functions is independent of the potential V (with appropriate 
assumptions) provided that (3 is one of the classical values, i.e., (3 € {1,2,4}, that corresponds to 
an underlying matrix ensemble. 

Notwithstanding matrix ensembles or orthogonal polynomials, the measure (jl.lOp on N points 
Ai,... , Ajv is perfectly well defined for any /3 > 0. It can be interpreted as the Gibbs measure 
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for a system of particles with external potential 5 V an d with a logarithmic interaction (log-gas) 
at inverse temperature (3. From this point of view j3 is a continuous parameter and the classical 
values (3 = 1,2,4 play apparently no distinguished role. It is therefore natural to extend the 
universality problem to all non-classical j3 but the orthogonal polynomial methods are difficult to 
apply for this case. For any (3 > the local statistics for the Gaussian case V(x) = x 2 /2 is given 
by a point process, denoted by Sine^. It can be obtained from a rescaling of the Airy^ process 
as lim^oo -^(Airy^ + a) = Sine^. The Airy process itself is the low lying eigenvalues of the 

one dimensional Schrodinger operator — + x + -^b' x on the positive half line, where b' x is the 
white noise. The relation between Gaussian random matrices and random Schrodinger operators 
is derived from a tridiagonal matrix representation [22] . Another convenient representation of the 
Sine^ process is given by the "Brownian carousel" [751184] . 

Beyond random matrices, the log-gas can also be viewed as the only interacting particle model 
with a scale-invariant interaction and with a single relevant parameter, the inverse temperature 
(3. It is believed to be the canonical model for strongly correlated systems and thus to play a 
similarly fundamental role in probability theory as the Poisson process or the Brownian motion. 
Nevertheless, we still have very little information about its properties. Unlike the universality 
problem that is inherently analytical, many properties of the log-gas are destined, at the first sight, 
to be revelead by smart algebraic identities. Despite many trials by physicists and mathematicians, 
the log-gas with a general (3 seems to defy all algebraic attempts. We do not really understand 
why the algebraic approach is suitable for (3 = 2, and to a lesser extent for /3 = 1,4, but it fails for 
any other f3, while from an analytical point of view there is no difference between various values of 
(3. To understand this fascinating ensemble, a main goal is to develop general analytical methods 
that work for any (3. 

1.4 Universality of the local statistics: the main results 

All universality results reviewed in the previous sections rely on some version of the explicit formula 
(jl.lOP that is not available for Wigner matrices with non-Gaussian matrix elements. The only result 
prior 2009 towards universality for Wigner matrices was the proof of Johansson [56] (extended 
by Ben Arous-Peche [7]) for complex Hermitian Wigner matrices with a substantial Gaussian 
component. The hermiticity is necessary, since the proof still relies on an algebraic formula, a 
modification of the Harish-Chandra/Itzykson/Zuber integral observed first by Brezin and Hikami 
in this context |13j . 

To indicate the restrictions imposed by the usage of explicit formulas, we note that previous 
methods were not suitable to deal even with very small perturbations of the Gaussian Wigner 
case. For example, universality was already not known if only a few matrix elements of H had a 
distribution different from Gaussian. 

Given this background, the main challenge a few years ago was to develop a new approach to 
universality that does not rely on any algebraic identity. We believe that the genuine reason behind 
Wigner's universality is of analytic nature. Algebraic computations may be used to obtain explicit 
formulas for the most convenient representative of a universality class (typically the Gaussian case) , 
but only analytical methods have the power to deal with the general case. In light of the two main 
classes of random matrix ensembles, we set the following two main problems. 

Problem 1: Prove the Wigner-Dyson-Gaudin-Mehta conjecture, i.e. the universality for Wigner 
matrices with a general distribution for the matrix elements. 

Problem 2: Prove the universality of the local statistics for the log-gas (jl.lOp for all j3 > 0. 

We were able to solve Problem 1 for a very general class of distributions. As for Problem 2, 
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we solved it for the case of real analytic potentials V assuming that the equilibrium measure is 
supported on a single interval, which, in particular, holds for any convex potential. We will give a 
historical overview of related results in Section 11.5.31 

The original universality conjectures, as formulated in Mehta's book [53], do not specify the 
type of convergence in (|1.6j) . We focus on two types of results for both problems. First we show 
that universality holds in the sense that local correlation functions around an energy E converge 
weakly if E is averaged on a small interval of size N~ 1+£ . Second, we prove the universality of the 
joint distribution of consecutive gaps with fixed labels. 

We note that universality of the cumulative statistics of N e gaps directly follows from the weak 
convergence of the correlation functions but our result on a single gap requires a quite different 
approach. From the point of view of Wigner's original vision on the ubiquity of the random 
matrix statistics in seemingly disparate ensembles and physical systems, the issue of cumulative 
gap statistics versus single gap statistics is minuscule. Our main reason of pursuing the single gap 
universality is less for the result itself; more importantly, we develop new methods to analyze the 
structure of the log-gases, which seem to represent the universal statistics of strongly correlated 
systems. In the next two sections we state the results precisely. 



1.4.1 Generalized Wigner matrices 

Our main results hold for a larger class of ensembles than the standard Wigner matrices, which we 
will call generalized Wigner matrices. 



Definition 1.1. ( [45] ) The real symmetric or complex Hermitian matrix ensemble H with centred 
and independent matrix elements hij = hji, i ^ j, is called generalized Wigner matrix if the 

"\f r ni^ Tvi rirvi -\r /~i I nvn nut n t> . . 



following assumptions hold on the variances of the matrix elements Sj,- = E|/t.--' 



(A) For any j fixed 



N 



2>y = l- (1-11) 



(B) There exist two positive constants, C\ and C2, independent of N such that 



C] Co 
N J N 



< Sij < -4- (1.12) 



The result on the correlation functions is the following theorem: 

Theorem 1.2 (Wigner-Dyson-Gaudin-Mehta conjecture for averaged correlation functions). \32\ 

Theorem 7.2] Suppose that H = {hij) is a complex Hermitian (respectively, real symmetric) gener- 
alized Wigner matrix. Suppose that for some constants e > 0, C > 0, 



E 



Nhij 



4+e 

< C. (1.13) 



Let n£N and O : W 1 — > R be compactly supported and continuous. Let E satisfy \E\ < 2 and let 
£ > 0. Then for any sequence bjy satisfying N~ 1+ ^ ^ ^ \\E\ — 2| /2 we have 



rE+b N ^ x r 

lim / — — / dai • • • da n 0(a\ , . . . , a n ) 

N^oo J E _ bN 20n Jr" 



f#-P&)f- + l^ ) --^ + ^^) =0. (1.14) 



esc {E)n V" \~ N esc (Ey->~ Ng sc (E) 
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Here g sc is the semicircle law defined in (|1.3p . Pjy is the n-point correlation function of the eigen- 
value distribution of H (|1.4p , and Pq N is the n-point correlation function of an N x N GUE 
(respectively, GOE) matrix. 

The condition (|1.12|) can be relaxed, see Corollary 8.3 [3D]. For example, the lower bound can 
be changed to N~ 9 / 8+£ . Alternatively, the upper bound C 2 N~ l can replaced with N~ 1+£n for some 
e n > 0. For band matrices, the upper and lower bounds can be simultaneously relaxed. 

We remark that for the complex Hermitian case the convergence of the correlation functions 
can be strengthened to a convergence at each fixed energy, i.e. for any fixed \E\ < 2 we have that 

do,!-. - d«„ O («!,..., «„) (P& } - P&ir) + jv^B) ^ + JV^fe) = °- 

(1.15) 

The main ideas leading to the results (|1.14p and (|1.15p have been developed in a series of papers. 
We will give a short overview of the key methods in Section 11.5.11 and of the related results in 
Section 11.5.31 

The second result on generalized Wigner matrices asserts that the local gap statistics in the 
bulk of the spectrum are universal for any general Wigner matrix, in particular they coincide with 
those of the Gaussian case. To formulate the statement, we need to introduce the notation jj for 
the j'-th quantile of the semicircle density, i.e. jj = 7^ is defined by 



J_ 

N 



g sc (x)dx. (1-16) 

-2 



We also introduce the notation §_A, BJ := {A, A + 1, . . . , B} for any integers A < B. 



Theorem 1.3 (Gap universality for Wigner matrices). \44\ Theorem 2.2] Let H be a generalized real 
symmetric or complex Hermitian Wigner matrix with subexponentially decaying matrix elements, 
i.e. we assume that 

F(VN\hij\ > x) C exp(-x' 9 ) (1.17) 

holds for any x > with some Co,?? positive constants. Fix a positive number a > 0, an integer 
n € N and a smooth, compactly supported function O : W 1 — > M. There exists an e > and C > 0, 
depending only on Co,??, a and O such that 

[E - W]0(N( Xj - x j+1 ), N( Xj - x j+2 ), N( Xj - x J+n )) | < CN~ £ , (1.18) 

for any j £ {ctN, (1 — a)iVj and for any sufficiently large N iVo, where Nq depends on all 
parameters of the model, as well as on n and a. Here E and & denotes the expectation with respect 
to the Wigner ensemble H and the Gaussian equilibrium measure (see (jl.5p for the Hermitian case), 
respectively. 

More generally, for any k,m £ \olN, (1 — ot)N\ we have 
^0((Ng k )(x k - Xfc+i), (Ng k )(x k - x k+2 ), • • • , (Ng k )(x k - x k+n )) (1.19) 



m %m+l ), {Ng m )(x m - x m+2 ), {Ng m )(x m - x m+n )) 
where the local density g k is defined by g k := g sc (jk)- 



< CiV 



-£ 
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As it was already mentioned, the gap universality with a certain local averaging, i.e. for the 
cumulative statistics of N £ consecutive gaps, follows directly from the universality of the correlation 
functions, Theorem ll.21 The gap distribution for Gaussian random matrices, with a local averaging, 
can then be explicitly expressed via a Fredholm determinant, see [17H19] . The first result for a single 
gap, i.e. without local averaging, was only achieved recently in the special case of the Gaussian 
unitary ensemble (GUE) in [83], which statement then easily implies the same results for complex 
Hermitian Wigner matrices satisfying the four moment matching condition. 



1.4.2 Log-gases 



In the case of invariant ensembles, it is well-known that for V satisfying certain mild conditions 
the sequence of one-point correlation functions, or densities, associated with \i = fj,^ from ([l.iop 
has a limit as N — > oo and the limiting equilibrium density gy(s) can be obtained as the unique 
minimizer of the functional 



I(y) = / V(t)u(t)dt 



log \t — s\h / (s)h'(t)dtds. 



We assume that g = gy is supported on a single compact interval, and g G C 2 (A,B). 

Moreover, we assume that V is regular in the sense that g is strictly positive on (A, B) and vanishes 
as a square root at the endpoints, see (1.4) of |12j . It is known that these condition are satisfied if, 
for example, V is strictly convex. In this case gy satisfies the equation 



1 



V'(t) 



(1.20) 



gy{s)ds 
t - s 

- x 2 /2, the equilibrium density is given by the 



for any t G (A,B). For the Gaussian case, V(x) 
semicircle law, gy = g sc , see p. 31) . 

The following result was proven in Corollary 2.2 of [11] for convex potential V and it was 
generalized in Theorem 1.2 of [12] for the non-convex case. 



Theorem 1.4 (Bulk universality of /3-ensemble). Assume V is real analytic with inf xg K V"(x) > — oo. 
Let (3 > 0. Consider the (3-ensemble [i = fJ-^y given in (jl.lOp and let p^' denote the n-point 
correlation functions of \i, defined analogously to (jl.4p . For the Gaussian case, V(x) = x 2 /2, the 
correlation functions are denoted by Pq ^ n - Let E G (A, B) lie in the interior of the support of g 



and similarly let E 1 G (—2, 2) be inside the support of g sc . Let O : 
supported function. Then for b^ = N~ 1+ ^ with any < £ ^ 1/2 we have 



be a smooth, compactly 



lim 



d«i • • • da n 0(a\, . . . , a r 



E+b* 



dx 



1 



E'+b N ^_ 

E>-b N 26at g sc (E') 



E _ bN 2b N g(E) 



1 (n) 
nPN 



X + 



;Pgn[ x + 



Ng sc {E') 



,x + 



iV£(£) 
Ng sc (E') 



,x + 



a, 



Ng(E) 



) (1-21) 



i.e. the appropriately normalized correlation functions of the measure a ^ the level E in the 

bulk of the limiting density asymptotically coincide with those of the Gaussian case. In particular, 
they are independent of the value of E. 



For the corresponding theorem on the single gap we need to define the classical location of the 
j-th. particle jjy by 

J_ _ 

N " 



gy(x)dx, 



(1.22) 
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similarly to the quantiles jj of the semicircle law, see (| 1 . 1 6 1) . We set 



Qj ■= Qv(lj,v), and gj := g sc {lj) (1-23) 

to be the limiting densities at the classical location of the j-th particle. Our main theorem on the 
/3-ensembles is the following. 



Theorem 1.5 (Gap universality for /3-ensembles). \44\ Theorem 2.3] Let ft ^ 1 and V be a real 
analytic potential with inf V" > — oo, such that gy is supported on a single compact interval, [A, B], 
gy £ C 2 (A, B), and that V is regular. Fix a positive number a > 0, an integer n G N and a smooth, 
compactly supported function O : R™ — > R. Let fi = fxy = (J,py be given by (|1.10|) and let fie denote 
the same measure for the Gaussian case, V(x) = \x 2 . Then there exist an e > 0, depending only 
on a, ft and the potential V , and a constant C depending on O such that 

E^o({Ngl)(x k - x fc+ i), (N e v k ){x k - x k+2 ), . . . , {N g v k ){x k - x k+n )) (1.24) 

- W a o({NQ m )(x m - x m+1 ), (Ng m )(x m - x m+2 ), (Ng m )(x m - x m+n )) < CN~ e 

for any k,m S [aJV, (1 — ce)Nj and for any sufficiently large N A^o, where Nq depends on V , f3, 
as well as on n and a. In particular, the distribution of the rescaled gaps w.r.t. fly does not depend 
on the index k in the bulk. 

We point out that Theorem 11.41 holds for any j3 > 0, but Theorem 11.51 requires ft ^ 1. Most 
likely this is only a technical restriction related to a certain condition in the De Giorgi-Nash-Moser 
regularity theory that is the backbone of our proof. 



1.5 Some remarks on the general strategy and on related results 
1.5.1 Strategy for the universality of correlation functions 

The proof of Theorem 11.21 consists of the following three steps, discussed in Sections [2l 13.11 and 13.21 
respectively. This three-step strategy was first introduced in |34j . 

Step 1. Local semicircle law and derealization of eigenvectors: It states that the density of eigen- 
values is given by the semicircle law not only as a weak limit on macroscopic scales (|1.3p . but also 
in a strong sense and down to short scales containing only N £ eigenvalues for all e > 0. This will 
imply the rigidity of eigenvalues, i.e., that the eigenvalues are near their classical location in the 
sense to be made clear in Section 13.11 We also obtain precise estimates on the matrix elements of 
the Green function which in particular imply complete derealization of eigenvectors. 

Step 2. Universality for Gaussian divisible ensembles: The Gaussian divisible ensembles are com- 
plex or real Hermitian matrices of the form 

H t = e-^Ho + y/l - e-tU, 

where Hq is a Wigner matrix and U is an independent GUE/GOE matrix. The parametrization 
of Ht reflects that Ht is most conveniently obtained by an Ornstein-Uhlenbeck process. There are 
two methods and both methods imply the bulk universality of Ht for t = N~ T for the entire range 
of < r < 1 with different estimates. 
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2a. Proposition 3.1 of \31$ which uses an extension of Johansson's formula 156)1 . 
2b. Local ergodicity of the Dyson Brownian motion (DBM). 

The approach in 2a yields a slightly stronger estimate (no local averaging in the energy) than 
the approach in 2b, but it works only in the complex Hermitian case. In these notes, we will focus 
on 2b. As time evolves, the eigenvalues of H t evolve according to a system of stochastic differential 
equations, the Dyson Brownian motion. The distribution of the eigenvalues of Ht will be written 
as ftfj,, where fi is the equilibrium measure (|1.5|) . We will study the evolution equation dtft = =£?/tj 
where 5£ is the generator to the Dirichlet form J |V/| 2 d/i. As time goes to infinity, ft converges 
to constant, i.e. to equilibrium. The key technical question is the speed to local equilibrium. 

Step 3. Approximation by Gaussian divisible ensembles: It is a simple density argument in the 
space of matrix ensembles which shows that for any probability distribution of the matrix elements 
there exists a Gaussian divisible distribution with a small Gaussian component, as in Step 2, such 
that the two associated Wigner ensembles have asymptotically identical local eigenvalue statistics. 
The first implementation of this approximation scheme was via a reverse heat flow argument |34j : 
it was later replaced by the Green function comparison theorem |45| that was motivated by the four 
moment matching condition of |81j . 

The proof of Theorem 11.41 consists of the following two steps that will be presented in Sections 14.11 
and|42l 

Step 1. Rigidity of eigenvalues. This establishes that the location of the eigenvalues are not too 
far from their classical locations jjy determined by the equilibrium density gy, see fll.22f) . At this 
stage the analyticity of V is necessary since we make use of the loop equation from Johansson [57] 
and Shcherbina |78j . 

Step 2. Uniqueness of local Gibbs measures with logarithmic interactions. With the precision of 
eigenvalue location estimates from the Step 1 as an input, the eigenvalue spacing distributions are 
shown to be given by the corresponding Gaussian ones. (We will take the uniqueness of the spacing 
distributions as our definition of the uniqueness of Gibbs state.) 

There are several similarities and differences between the proofs of Theorem 11.21 and 11.41 Both 
start with rigidity estimates on eigenvalues and then establish that the local spacing distributions 
are the same as in the Gaussian cases. The Gaussian divisible ensembles, which play a key role 
in our theory for noninvariant ensembles, are completely absent for invariant ensembles. The key 
connection between the two methods, however, is the usage of DBM (or its analogue) in the Steps 
2. In Section l3~Tj we will first present this idea. 

1.5.2 Strategy for gap universality 

The proofs of Theorems 11.31 and 11.51 require several new ideas. The focus is to analyze the local 
conditional measures fi y and /t, y ^y instead of the equilibrium measure ft and the DBM evolved 
measure ftfi. They are obtained by fixing all but K, consecutive points, denoted by y. The local 
measures are Gibbs measures on /C points, denoted by x, that are confined to an interval J = J y 
determined by the boundary points of y. The external potential, V y , of the local measure contains 
not only the external potential V from (i, but also the interactions between x and y. 

The first step is again to establish rigidity, but this time with respect to the conditional mea- 
sures fiy and (ftfJ-)y ='■ ft,yfJ-y, at least for most boundary conditions y. Due to the logarithmic 
interactions, V y is not analytic any more and the loop equation is not available, but the rigidity 
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information can still be extracted from the rigidity with respect to the global measure with some 
additional arguments. 

In the second step, which is the key part of the argument, we establish the universality of the gap 
distribution w.r.t fj, y by interpolating between /i y and fi y with two different boundary conditions y 
and y. This amounts to estimating the correlation between a gap observable, say 0{xi — ajj+i), and 
V y — V y . The correlation between particles in log-gases decay only logarithmically, i.e. extremely 
slowly: 

(Xj, Xj) 1 



yj{x i -x i ){x j ;x j ) log\i-j\ 



(1.25) 



at least if i,j are far from the boundaries. Here (• ; •) denotes the covariance with respect to /i y . 
The key observation is that correlation between a gap Xi — Xi+\ and a particle Xj decays much faster 

\ x i ~ x i+l't x j) _ 1 (126) 



^{xi - x i+ i;xi - x i+ i)(xj;xj) K - Jf| ' 

because it is essentially the derivative in i of (jl.25p . The decay of the gap-gap correlation is even 
faster. 

While the formulas (|1.25p - (|1.26p are plausible, their rigorous proof is extremely difficult due to 
the very strong correlations in \x y . We are able to prove a much weaker version of (jl.26p , practically 
a decay of order \i — j\~ £ for some small e > 0, which is sufficient for our purposes. Even the proof 
of this weaker decay requires quite heavy tools. 

We start with a classical observation by Helffer and Sjostrand [55] that the covariance of any 
two observables /, g with respect to a Gibbs measure fi = exp(— %(x))dx can be expressed as 

/•oo 

</(x);<7(x)V = / (h t (x),V 5 (x)) At dt, d t h t = -(^ + n")h u h = V/, (1.27) 
J o 

where ^£ ^ is the generator to the Dirichlet form J |V/| 2 d/u and %" is the Hessian of the 
Hamiltonian. The generator in the heat equation in (jl.27p creates a time dependent random 
environment x(i) that makes the matrix entries (xj — Xj)~ 2 of H" time dependent. The solution ht 
to the equation in (I1.27P can be thus represented as a random walk in a time dependent random 
environment, where the jump rate from site i to j is given by (xi(t) — Xj(t))~ 2 at time t. On large 
scales and for typical realizations of x(£), this jump rate is close to a discretization of the \J — A 
operator. A discrete version of Di Giorgi-Nash-Moser partial regularity theory |14j then guarantees 
that the neighboring components of h( are close, which renders the covariance (ht(x), Vg(x)) M 
small, assuming that g is a function of X{ — Xi + \. In more general terms, the correlation decay 
(|l,26p with \i — j\~ £ is equivalent to the Holder regularity a discrete parabolic PDE with random 
coefficients. This approach has a considerable potential to study log-gases since it connects the 
problem with one of the deepest phenomena in PDE. 

Finally, in the third step, we pass the information on the universality of the gap w.r.t. local 
measures to the global ones. For the invariant ensemble this step is fairly straighforward, while for 
the Wigner ensemble we need to use an approximation step similar to Step 3 in Section 11.5.11 



1.5.3 Historical remarks 

The method of the proof of Theorem [L2] is extremely general and the result holds for a much larger 
class of matrix ensembles with independent entries. Adjacency matrices of the Erdos-Renyi graphs 
are also covered as long as the matrix is not too sparse, namely more than JV 2 / 3 entries of each 
row are non-zero on average |31U32| . Although Theorem 11.21 in its current form was proved in [32], 
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the key ideas have been developed through several important steps in [M l l4T) jl4l)H4"T] . In particular, 
the Wigner-Dyson-Gaudin-Mehta (WDGM) conjecture for complex Hermitian matrices in the form 
of (|1.15p was first proved in Theorem 1.1 of |34j . This result holds whenever the distributions of 
the matrix elements are smooth. The smoothness requirement for (|1 . 15[) was partially removed 
in [ST] and completely removed in [35] but only in the averaged convergence sense (|1.14|) . For a 
general distribution (| 1 . 15[) was proved in Theorem 5 in [82]. Although the proof in |82] took a 
slightly different path, this generalization is an immediate corollary of previous results [33]. These 
arguments are restricted to the complex Hermitian case since they still use some explicit formula. 

The WDGM conjecture for real symmetric matrices in the averaged form of (|1.14p was resolved 
in [30] (a special case, under a restrictive third moment matching condition, was treated in [81]). 
In [40], a novel idea based on Dyson Brownian motion was introduced. The most difficult case, 
the real symmetric Bernoulli matrices, was solved in [16], where a "Fluctuation Averaging Lemma" 
(Theorem l2.16l of the current paper) exploiting cancellation of matrix elements of the Green function 
was first introduced. A more detailed historical review on Theorem 11.21 was given in Section 11 
of 02]. 

For (5 = 2, Theorem 11.41 was proved for very general potentials, the best results for f3 = 1,4 
|18 y 59 1 l7H] are still restricted to analytic V with additional conditions. Prior to Theorem 11.41 there 
was no result for general /3, except for the Gaussian case |84j . 

Given the historical importance of the Wigner surmise, it is somewhat surprising that single 
gap universality did not receive much attention until very recently. This is probably because our 
understanding of the Wigner-Dyson-Gaudin-Mehta universality became sufficiently sophisticated 
only in the last few years to realize the subtle difference between fixed energy and fixed label 
universality. In fact, even the GUE case was not known until the very recent paper by Tao [83]. 
In this work, the complex Hermitian Wigner case was also covered under the condition that the 
distribution matches that of the GUE to fourth order. Theorem 11.31 is considerable more general, 
as it applies to any symmetry classes and does not require moment matching. Finally, the single 
gap universality of the invariant ensembles has not been considered before Theorem 11.51 

1.5.4 What will not be discussed 

In these lecture notes we focus on the four universality results, Theorem 1 1.2ffL5l and the necessary 
background material. There are many related questions on random matrix universality and several 
of them can be studied with the methods we present here. Here we just list them and give a few 
relevant references. 

• Edge universality for Wigner matrices. See Section 9 of [42] for a summary and also the 
recent paper [60] that gives the the optimal moment condition. 

• Universality of eigenvectors. See [58] . 

• Universality for sample covariance matrices. See |41[I73[ I74"]. 

• Sparse matrices and adjancency matrices of Erdos-Renyi graphs. See Section 10 of [42] . 

1.5.5 Structure of the lecture notes 

A large part of presentation in these lecture notes is borrowed from other papers and reviews 
written on the subject [26, 30, 42] and sometimes whole paragraphs of the original articles are 
verbatim taken over. The overlap is especially large with the review paper [42j : Sections I3.1H4.2I 
on the Dyson Brownian motion, the Green function comparison theorem and on the analysis of the 
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/3-ensemble are repeated without much changes. The local semicircle law (Section [2]) is presented 
here more generally than in [42] . following the recent paper |30j . For pedagogical reasons, we will 
give the proof in a simplified form in Section 12.41 and we only comment on the general proof in 
Section [2. 51 Sections I2.3.3H2~.3. 41 cover new results on random band matrices based upon the recent 
work [33] . Section [5] presents an extensive outline of the proofs of Theorems 11.31 and 11.51 on the 
single gap universality following the very recent paper |44j . 

We will use the convention that C and c denote generic positive constants whose actual values 
are irrelevant and may change from line to line. For two ^-dependent quantities An and we 
use the notation An x Bn to express that c ^ An/Bn ^ C. 

Acknowledgement. The results in these lecture notes were obtained in collaboration with Horng- 
Tzer Yau, Benjamin Schlein, Jun Yin, Antti Knowles and Paul Bourgade and in some work, also 
with Jose Ramirez and Sandrine Peche. This article reports the joint progress with these authors. 



2 Local semicircle law for general Wigner-type matrices 
2.1 Setup and the main results 

Let (hij : i ^ j) be a family of independent, complex-valued random variables satisfying Khij = 
and ha £ K for all i. For i > j we define := hji, and denote by H = {hij)fj =l the N x N 
matrix with entries h^. By definition, H is Hermitian: H = H* . (Note that this setup also 
includes the case of a real symmetric matrix H.) Such ensembles will be called general Wigner-type 
matrices. Note that we allow for the matrix elements having different distributions. This class 
of matrices is a natural generalization of the standard real symmetric Wigner matrices for which 

£ R are identical distributed, and the standard complex Hermitian Wigner matrices for which 
the off-diagonal elements hij € C are identically distributed and the diagonal elements ha £ M have 
their own, but still identical distribution. 

The fundamental data of the model is the N x N matrix of variances S = (s^), where 

Sij . — E | hij I . 

We introduce the parameter M ■= [maxjj s^l 1 that expresses the maximal size of Sij\ 

s lj ^ Ar 1 (2.1) 
for all i and j. We regard iV as the fundamental parameter and M = Mn as a function of N. 

N 5 < M ^ N (2.2) 
for some fixed 5 > 0. We assume that S is (doubly) stochastic: 

= 1 (2.3) 

j 

for all i. For standard Wigner matrices, hij are identically distributied, hence Sy = and M = N. 
In this presentation, we allow for the matrix elements having different distributions but indepen- 
dence (up to the Hermitian symmetry) is always assumed. 

Example 2.1. Random band matrices are characterized by translation invariant variances of the 
form 

1 f ( V-3\ N" 
W J \ W 



^ = ^/P™^) (2-4) 
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where / is a smooth, symmetric probability density on R, W is a large parameter, called the 
band width, and \i — j\n denotes the periodic distance on the discrete torus T of length N. The 
generalization in higher spatial dimensions is straighforward, in this case the rows and columns of 
H are labelled by a discrete d dimensional torus of length L with N = L d . 

For convenience we assume that the normalized entries 

&j := s ij ^ hij (2-5) 
have a polynomial decay of arbitrary high degree, i.e. for all p € N there is a constant jjL p such that 

ncij\ p ^ v P (2.6) 

for all N, i, and j. We make this assumption to streamline notation, but in fact, our results hold, 
with the same proof, provided (j2.6[) is valid for some large but fixed p. If we strengthen it to 
uniform subexponential decay, (jl . 17|) . then certain estimates will become stronger. In this paper 
we work with (|2.6p for simplicity, but we remark that most of our previous work used (|1.17p . 

Throughout the following we use a spectral parameter z £ C satisfying Imz > 0. We shall use 
the notation 

Z = E + IT] 

without further comment. The eigenvalues of H in the TV" — > oo limit are distributed by the 
celebrated Wigner semicircle law, 

Q(x) = Qsc(x) := ^V(4-z 2 )+. (2.7) 
and its Stieltjes transform at spectral parameter z is defined by 

m f z ) : = f J&Ldx. (2.8) 
Jr x - z 

To avoid confusion, we remark that m was denoted by m sc and g by g sc in most of our previous 
papers. In this section we drop the subscript referring to "semicircle". It is well known that the 
Stieltjes transform m is the unique solution of 

m{z) + -^— + z = (2.9) 

m(z) 

with lmm(z) > for Imz > 0. Thus we have 



m{z) = z + ^~\ (2 . 10 ) 
We define the resolvent of H through 

G{z) := (H-z)' 1 , 

and denote its entries by Gij(z). The Stieltjes transform of the empirical spectral measure 

g N (dx) = — ^ ~ x)dx 

a 

for the eigenvalues Ai ^ A2 ^ . . . ^ Atv of H is 

"•w^/tt^)- (2 - u> 

./IB X — Z I\ 
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An important parameter of the model is 



F(z) , 



1 — m 2 (z)S 



(2.12) 



Note that S, being a stochastic matrix, satisfies — 1 ^ S ^ 1, and 1 is an eigenvalue with eigenvector 
e = A^~ 1 / 2 (l, 1, . . . 1), Se = e. We assume that 1 is simple for convenience. Another important 
parameter is 

1 



T(z) :- 



1 — m 2 (z)S 



(2.13) 



i.e. the norm of 1 — m 2 S restricted to the subspace orthogonal to the constants. Clearly r ^ T. 
For standard Wigner matrices we easily obtain that 



r(z) 



i 



i 



\l-m 2 (z)\ ^ke + 77 ' 



T(z) 



1- 



(2.14) 



where ke '■= \\E\ — 2| denotes the distance of E to the spectral edges. For generalized Wigner 
matrices (Definition II. ip essentially the same relations hold: 

1 1 



r(*) 



r(z) * 1. 



(2.15) 



\l-m 2 (z)\ ^Jk e + ??' 

The following definition introduces a notion of a high-probability bound that is suited for our 
purposes. 

Definition 2.2 (Stochastic domination). Let 

X = (iW (u) : N G N, u G U™) , Y = (Y™ (u) : N G N, u G tfW) 

be two families of nonnegative random variables, where U^ N ' is a possibly A^-dependent parameter 
set. We say that X is stochastically dominated by Y , uniformly in u, if for all (small) e > and 
(large) D > we have 



sup . 



X^ N \u) > N e YW(u) 



for large enough ^ Nq(e,D). Unless stated otherwise, throughout this paper the stochastic 
domination will always be uniform in all parameters apart from the parameter 5 in (|2.2p and the 
sequence of constants fi p in (|2.6p ; thus, Nq(e,D) also depends on 5 and fi p . If X is stochastically 
dominated by Y, uniformly in u, we use the notation X -< Y. Moreover, if for some complex family 
X we have \X\ xFwe also write X = 0^(Y). 



(2.16) 



For example, using Chebyshev's inequality and (12. 6p one easily finds that 



h%j (sjj) 



1/2 



uniformly in z and j, so that we may also write /ijj = ((sij) 1 ^ 2 ). An easy exercise shows that the 
relation -< satisfies the familiar algebraic rules of order relations, e.g. such relations can be added 
and multiplied. The definition of -< with the polynomial factors N~ £ and N~ D are taylored for the 
assumption (|2.6|) . We remark that if (|1.17|) is assumed, a stronger form of stochastic domination 
can be introduced but we will not pursue this direction here. 
Since 

lim Imm(E + irj) = ir g(E), 
77—5-0+ 
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the convergence of the Stieltjes transform mAr(z) to m(z) as N — > oo will show that the empirical 
local density of the eigenvalues around the energy E in a window of size ry converges to the semicircle 
law g(E). Therefore the key task is to control mj^(z) for small 77 = Imz. 

We now define the lower threshold for 77 that depends on the energy E E [—10, 10] 



If M~ 7 M _2 t 1 

77b : = min< 77 : ^ min< — , — > for all z£ \E + in, E + 10il > . (2.17) 

I M V \t(z) 3 TizYlmmiz)) L I 



Here 7 > is a parameter that can be chosen arbitrarily small; for all practical purposes the reader 
can neglect it. For generalized Wigner matrices, M x N, from (12.150 we have 



Ve 



i.e. we will get the local semicircle law on the smallest possible scale 77 ^> iV -1 , modulo a polynomial 
correction with an arbitrary small exponent. We remark that if we assume sub exponential decay 
(|1.17p instead of the polynomial decay (|2.6p . then the small polynomial correction can be replaced 
with a logarithmic correction factor. 

Finally we define our fundamental control parameter 



uniformly in i,j, as well as 



„, . Imm(z) 1 

n(z) := J , r u + — . 2.18 
v 1 Y Mr] Mr] y ' 

We can now state the main result of this section, which in this form appeared in [30] . 

Theorem 2.3 (Local semicircle law). Uniformly in the energy \E\ $J 10 and 77 E [t/e, 10] we have the 
bounds 

|Gtf(z)-*ym(*)| -< n W = y^p + ^' ^ = ^ + i7? (2.19) 

\m N (z) -m(z)\ -< —. (2.20) 

We point out two remarkable features of these bounds. The error term for the resolvent entries 
behaves essentially as (Mr])~ l l 2 , with an improvement near the edges where Imm vanishes. The 
error bound for the Stieltjes transform, i.e. for the average of the diagonal resolvent entries, is one 
order better, (Mr]) -1 , but without improvement near the edge. 

Various local semicircle laws have a long history. For standard Wigner matrices (i.e. M = N 
and r = 1), the optimal threshold for the smallest possible 77 3> 1/N has first been achieved in [3S] 
in the bulk after an intermediate result on scale 77 3> TV -2 / 3 in |37j . The first effective result near the 
edge was given in [39]. The optimal power (Nn) _1 for — m in the bulk has first been obtained 
in [46] where the first version of the fluctuation averaging mechanism has appeared. The optimal 
behavior near the edge was first derived in |47j . The case of M <C N has first been studied in [45] 
where the threshold 77 3> 1/M in the bulk spectrum, \E\ < 2, has been achieved. The optimal 
power (M?7) _1 is proved in [30], where the technique of |45] was combined with the fluctuation 
averaging mechanism. The edge behavior, i.e. the deterioration of the threshold tje near the edge 
has also been extensively studied in [30] and it led to the power —3 of T in the definition of ?}e, 
but it is yet unclear whether this power is optimal. 

In Section 12.31 we demonstrate that all proofs of the local semicircle law rely on some version of 
a self-consistent equation. At the beginning this was a scalar equation for m^. The self-consistent 
vector equation for vi = Gu — m (see Section 12 .3.2[) first appeared in [45] . This allowed us to 
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deviate from the identical distributions for hij and opened up the route to estimates on individual 
resolvent matrix elements. Finally, the self-consistent matrix equation for E|G xy | 2 first appeared 
in [33] and it yielded diffusion profile for the resolvent (Section |2.3.4|) . 

We now list a few consequences of the local semicircle law. It is an elementary property of the 
Stieltjes transform that once ~ m is established for all spectral parameter z with Imz ^ ?}e, 
then qn and g coincide on scales larger than t\e near the energy E. This means that J f qn — >■ j f g 
for test functions on scale t)e i-e. \f'\ <C 1/t}e- I n particular, if m^{z) — > m(z) uniformly on the 
half planes {z : Im z ^ e} for any fixed e, then g^ converges to g weakly. The bound (|2.20p asserts 
much more: it identifies gjy with g on scales of order 1/M. In the standard Wigner case, M = N, 
it is basically the optimal scale since below scale 1/N the empirical measure gjy strongly fluctuates 
due to individual eigenvalues. 

Once the local density is identified, we can deduce results on the location of individual eigen- 
values and on the counting function. Here we formulate the corresponding statements only for the 
simpler case when Sjj is comparable with 1 /N as they were stated in [47] (apart from the fact that 
in [J7] a sub exponential decay was assumed). The precise results in the general case are somewhat 
more complicated and they can be found in [30] . 

Let 7 a = 7 Qj 7v denote the location of the a-th point under the semicircle law, i.e., 7 Q is defined 

by 

/•To 

N / g(x)dx = a, l^a^N. (2.21) 



We will call j a the classical location of the a-th point. Furthermore, for any real energy E, let 

i r E 

Xl N (E) := - #{A Q < E}, n(E) := / g(x)dx 

1 * J —oo 

be the empirical counting function of the eigenvalues and its classical counterpart. 

Corollary 2.4 (Rigidity of eigenvalues and limit of the counting function). Theorem 2.2] For 

generalized Wigner matrices ( Definition we have 

I A a -j a \ -< jy 2 /3gi/3 ' S := min{a,iV + 1 - a}, (2.22) 
uniformly in a € {1,2, ... , N}. Furthermore, 



\n N (E)-n(E)\^± (2.23) 



uniformly in E G 



We remark that under the stronger decay condition (|1.17p instead of (|2.6p . the N e factors 
implicitly present in the notation -< can be improved to logarithmic factors, see |47j . 

Corollary 12.41 is a simple consequence of the Helffer-Sjostrand formula which translates infor- 
mation on the Stieltjes transform of the empirical measure first to the counting function and then 
to the locations of eigenvalues. The formula yields the representation 

27T J R 2 A - x - ig 27r y R2 A - x - ly 

for any real valued C 2 function / on R, where x(u) is an Y smooth cutoff function with bounded 
derivatives and supported in [—1,1] with x{v) = 1 f° r \y\ ^ 1/2- In the applications, / will be 
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a smoothed version of the characteristic functions of spectral intervals so that ^ ■ f(\j) counts 
eigenvalues in that interval. From (|2.24p we have 

~jy Yl ^( A J') = ^ J K2 + + iyf'i^x'iy)) ni N (x + iy)dxdy, 

and then can be approximated by m{x + iy). The details of the argument can be found in [36J. 
Once (|2.23p is established, it is an elementary argument to translate it into the rigidity of the 
eigenvalues (|2,22|) . The powers 2/3 and 1/3 in (|2.22|) stem from the fact that g(x) has a square 

3/2 

root singularity near the spectral edges x ~ ±2, therefore n{x) ~ (x + 2) + for x near —2. 

Although Wigner's semicircle law and its local version only concerns m^r = jj TrG, we remark 
that the resolvent matrix elements, Ga and Gij also carry important information. For example a 
good bound on Gu implies derealization of the eigenvectors. Indeed, by the spectral decomposition, 
we have 



where u a = (u a (l) , . . . u a (N)) is the (normalized) eigenvector belonging to the eigenvalue X a . 
Choosing the energy E in the tj- vicinity of Aq., we obtain |-u a (i)| 2 ^ T]lmGu(z). Therefore, if 
\Gu(z) | ^ C can be shown uniformly for any z with Imz ^ Tj(N) for some A-dependent threshold 
rj(N), then we conclude 

max ||u Q ||^ < Crj(N). 

a 

In the Wigner case, the threshold n{ A) is almost 1 /A, thus we obtain the complete derealization 
of the eigenvectors: 

Corollary 2.5. For the l 2 -normalized eiyenvectors u a , a = 1,2..., A, of the standard Wigner 
matrix, we have 

||u a ||oo -< A -1 / 2 . (2.25) 

This result was first proven in [38] without bounding Ga. In contrast to the argument in [38] . 
the proof via Ga can also be easily extended to a general class of Wigner-type matrices. For 
example, an elementary argument shows that if there are two positive constants c and C such that 
c < Nsij «S C for all i,j, then f(z) < C uniformly in z, thus rj E ^ CA~ 1+ ^ and (ET25|) holds. 

2.2 Tools 

In this subsection we collect some basic definitions and facts. 
Definition 2.6 (Minors). For T C {1, . . . , A} we define by 

(ff (T) )ii := T)l{j i T)hij . 
Moreover, we define the resolvent of and its normalized trace through 



G 



(T) 



(z) := (HW-z)7/, m (T)( z ) := -TrG^(z). 



We also set 



(T) 

i i : i<£T 



E ■ 
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Definition 2.7 (Partial expectation and independence). Let X = X(H) be a random variable. For 
i € {1, • • • , N} define the operations Pi and Qi through 

PiX := E(X\H®), QiX := X-PiX. 

We call Pi partial expectation in the index i. Moreover, we say that X is independent of T C 
{1, . . . , N} if X = PiX for all i G T. 

We shall frequently make use of Schur's well-known complement formula, which we write as 



G 



(Ti) 

_ = ha- hikGu h n ' ( 2 - 26 ) 



A-./ 



where i£ T C {1, . . . , N}. 

The following resolvent identities form the backbone of all of our calculations. The idea behind 
them is that a resolvent matrix element Gij depends strongly on the i-th and j-th columns of H , but 
weakly on all other columns. The first identity determines how to make a resolvent matrix element 
Gij independent of an additional index k / i,j. The second identity expresses the dependence of 
a resolvent matrix element Gij on the matrix elements in the i-th or in the j-th column of H. We 
added a third identity that relates sums of off-diagonal resolvent entries with a diagonal one. The 
proofs are elementary. 

Lemma 2.8 (Resolvent identities). For any real or complex Hermitian matrix H and T C {1, . . . , N} 
the following identities hold. Ifi,j,k ^ T and i,j ^ k then 

G m G m i 1 r W r TO 

r (T) _ r {Tk) U ik ^kj 1 1 G ifc U ki / 9 97 ^ 

u kk u ii u ii u ii u ii u kk 

If i,j ^ T satisfy i ^ j then 

(Ti) (Tj) 

4 T> = -4 T) E Msg" = -cg> E • (2-^) 

Moreover, we have 

£|GS?| 2 = ilmGS?, (2.29) 

i 

which is sometimes called the Ward identity. 

Finally, in order to estimate large sums of independent random variables as in (|2.26|) and (|2.28D . 
we will need a large deviation estimate for linear and quadratic functionals of independent random 
variables: 

Theorem 2.9 (Large deviation bounds). Let (X^) and (Y^ ) be independent families of random 
variables and (oj^) o,nd (b\ N ^) be deterministic; here JVgN and i, j = 1, . . . , N. Suppose that all 
entries X^ and Y^ N ^ are independent and satisfy 

EX = , E\X\ 2 = 1, (E\X\ p ) 1/p < n p (2.30) 
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for fllipGH and some constants fi p . Then we have the bounds 



/ \ 1/2 

-< (En 2 ) ' ( 2 - 31 ) 



Ewj -< (Em 2 ) > ( 2 - 32 ) 

/ \ 1/2 

^a^XiXj ~< l^l^l 2 ) • (2.33) 

Sketch of the proof. The estimates (|2.31[) . (|2.32p . and (|2.33j) follow from estimating high mo- 
ments of the left hand sides combined with Chebyshev's inequality. The high moments of (|2.3ip 
directly follow from the Marcinkiewicz-Zygmund martingale inequality. High moments of (|2.32[) . 
and (|2.33|) are computed by reducing them to (|2.31|) with a decoupling argument. The details are 
found in Lemmas B.2, B.3, and B.4 of [29]. □ 



2.3 Self- consistent equations on three levels 

By the Schur complement formula (|2.26|) . we have 

Ga = j-. — . (2.34) 

The partial expectation with respect to the index i gives 

(<) • (i) ■ {i) (i) G- G 

Pj^hikG^hu = y~) sjkG^i = sikGkk + e s ik ^ k% , 

/■ . / /c /c \\> 

where in the second step we used (|2.27p . Introducing the notation 

Vi ■■= Gu - m 

and recalling (|2.3p . we get the following self-consistent equation for vf. 

1 



-z — m 



(Efe s ikVk ~ Ti) 

where 



m, (2.35) 



G G W 
Tj := Ai + hu-Zi, Ai := ^ Sifc ^ ^ , : = Qi^hikG^hii . (2.36) 

fc fe,H 

All these quantities depend on z, which fact is suppressed in the notation. We will show that T 
is a lower order error term. This is clear about ha by (|2.16p . The term Ai will be small since 
off-diagonal resolvent entries are small. Finally, Z; t will be small by a large deviation estimate 
Theorem 12.91 Before we present more details, we heuristically show the power of the self-consistent 
equations. 
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2.3.1 A scalar self-consistent equation 

Introduce the notation 



^S 1 



(2.37) 



for the average of a vector (a,i)^L 1 . Consider the standard Wigner case, Sjj = 1/N. Then 

^ SikVk = ^2 v k = [v] ( = m N - m) . 

k k 

Neglecting Tj in (|2.35p and taking the average of this relation for each i, we get 

1 



-z — m — \v\ 



m . 



(2.38) 



Since 



m 



m 



by the defining equation of m, see (|2.9p . and this equation is stable under small perturbations, at 
least away from the spectral edges z = ±2, we obtain from (|2.38p that [v] ~ 0. This means that 
mjv ~ m and hence qn ~ Q, i-e. we obtained the Wigner's original semicircle law. 

Historically the semicircle law was first found via the moment method [86] by computing 
jjKTrH k , k = 1,2,... in the N — > oo limit, and identifying them with the moments of the 
semicircle measure g(x)dx. In this approach the semicircle law emerges as a result of a somewhat 
tedious, albeit elementary calculation. A more direct approach is to take the average in i of the 
Schur's formula (|2.34|) which immediately gives 



m N 



m N 



after neglecting the error terms Tj. This identifies the limit of mjy immediately with m, the 
(unique) solution to (|2.9|) . Taking the inverse Stieltjes transform then yields the semicircle law in 
a very direct way. 



2.3.2 A vector self-consistent equation 

If the variances Sij are not constant or we are interested in individual resolvent matrix elements Ga 
instead of their average, -k Tr G, then the scalar equation (|2.38|) discussed in the previous section 
is not sufficient. We have to consider (|2.35p as a system of equations for the components of the 
vector v = {y\, . . . , vn). 

From the explicit formula for m (|2.10p . we know that |m + z\ ^ 1. Assuming temporarily that 



i 

- 

2 



(2.39) 



we can expand the right-hand side of (|2.35p around — z — m up to second order and using the 
identity (I2.9P we obtain 



V; 



(^2 s ik v k - T-^J + O ^2 s ikVk - T,^ 



(2.40) 
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This is the key equation to study Uj. Considering all higher order terms and Tj as errors we get a 
self-consistent equation 

v i =m 2 '^2 SikVk + Error 
k 

or, with matrix notation for the vector v: 

v = m 2 Sv + £, (2.41) 
where £ represent the vector of error terms. Thus 

1 

\\f\\ — T\\f\\ 



£, hence II vll ^ ^ 



1 — m 2 S 



1 — m 2 S 



and this relation shows how the quantity T emerges. If the error term is indeed small and T is 
bounded, then we obtain that HvH^ = max \Gu — m\ is small. 

2.3.3 A matrix self-consistent equation 

The off-diagonal resolvent matrix elements, Gj,-, are strongly oscillating quantities and they are not 
expected to have a deterministic limit. However the local averages of their squares, 

Txy • — ^ ^ Sxi\ G%y \ i 
i 

are expected to behave regularly. Note that in the Wigner case, using the identity (12.290 . the 
quantity 

i i 

is independent of x, but in the general case T xy carries information on the localization length of 
the eigenfunctions. In particular, if T xy decays only beyond a scale £, i.e. T xy remains comparable 
with T xx for \x — y\ <C I, then most eigenfunctions have a localization length at least i. 
The self-consistent equation for T xy can be derived from f|2.28|) : 

(0 

Gi y = Ga ^ ^ flikGfcy. 

k 

Replacing Ga with m and taking the square, we have 

(i) 



\Giy\ ~ \tti\ ^ ^ hjk him Gray ■ 
m,k 

Taking partial expectation yields 



(i) 

Pi\Gi y \ 2 ~ \m\ 2 'y ^ s ik\G kv \ 2 ~ \m\ 2 ^ ^ Sjk\Gk y \ 2 = \fn\ 2 Ti y , 

k k 

where in the second step we used (|2.27p to remove the upper index i and we used that the off- 
diagonal elements are of smaller order. This formula holds for i ^ y, for the special case i = y we 
have a diagonal element, i.e. 

Pi\Gi y \ 2 « \m\ 2 T iy + \m\ 2 5 iy . 
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Thus we have 



Txy — ^ ^ S x j | Gjy | — ^ ^ S x jPj | Gjy | ~\~T X y, ^xy • — ^ ^ S xiQ i I G^;/ 1 • (2.42) 

i i i 

The term T xy is lower order by a fluctation averaging mechanism, see the explanation after Theo- 
rem [2TT6J Hence, neglecting this term we have 

Txy ~ ^ ^ SxjPj I I — \m\ ^ ^ SxjTjy + |?n| s^, 

i.e. the matrix T satisfies the self-consistent matrix equation 

T = \m\ 2 ST + \m\ 2 S + £, (2.43) 
where E is an error matrix. The solution is 

H 2 s 1 , ftlJ , 
T = 1 1 12c + 1 — rw 5 ' 2 - 44 

1 — Im^D 1 — |m| z o 

where the first term, given explicitly in terms of the variance matrix S, gives the leading order 
behavior for T: 

t «e e- |m|2g 

1 - |m| 2 5 

2.3.4 Application: diffusion profile for random band matrices 

Depending on the structure of S, in some cases O can be computed. Consider, for example, case of 
the random band matrices, (|2.4p . Here s xy and hence @ xy are translation invariant, @ xy = Qx-yi 
and the Fourier transform of 9 is approximately given by 

»^^TB? D:=i/,VWdx, o-^j- (2.45) 

This means that the profile of is approximately given by a diffusion profile on scale W with 
diffusion constant D. 

The analysis of the error term in (|2.44p requires estimating the norm of (1 — \ m\ 2 S) -1 . Note that 
unlike in the analysis of (|2.4ip . here 1 — |m| 2 5 and not 1 — m 2 S has to be inverted. Since \m\ 2 ~ 
1 — Crj for small r\ and S has eigenvalue 1, the inverse of 1 — |m| 2 5 is very unstable. Fortunately, 
one can subtract the constant mode in (|2.43p before solving the equation, thus eventually only the 
norm of (1 — Im^S") -1 on the subspace orthogonal to the constants is relevant. Thus the spectral 
gap of S plays an important role and we use that for band matrices the gap is of order (W/N) 2 . 
The details are found in [33], where, among other results, the following theorem was shown: 

Theorem 2.10 (Diffusion profile). ]33\ Theorem 2.4] Let H be a random band matrix with band 
width W , i.e. the variances are given by (|2.4p . Suppose that N <C W 5 ^ and (W/N) 2 ^ 77 ^ 1. 
Then 

H -L + ^PL. (2.46) 



\T xy @ X y\ -< N 



Nrj y/W 

All estimates are uniform in x,y £ T and in the spectral parameter z = E + \rj for \E\ ^ 2 — k and 
for any fixed k > 0. 
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This theorem identifies |Gr X y| m * wo different senses of averaging; T xy averages in one of the 
indices, while P x takes partial expectation. In both cases the result is essentially Q xy . 

The behavior of 9 x - y = ®xy can De analyzed by inverse Fourier transform from (|2.45p . If 
r\ <C (W/N) 2 then 6 is essentially a constant, i.e. the profile is flat. Conversely, if rj (W/N) 2 
then we get an exponentially decaying profile on the scale \x\ ~ Wr] -1 ! 2 . The shape of the profile 
is therefore nontrivial if and only if r\ 3> (W/N) 2 . The total mass of the profile 



n 



(2.47) 



and the average height of the profile is of order (Nri)~ l . The peak of the exponential profile has 
height of order (W^/rj) -1 , which dominates over the average height if and only if rj S> (W/N) 2 . 
The regime rj 3> (W/N) 2 corresponds to the regime where rj is sufficiently large that the complete 
derealization has not taken place, and the profile is mostly concentrated in the region \x — y\ ^ 
Wq- 1 / 2 < N. 

These scenarios are best understood in a dynamical picture in which 77 is decreased down from 1. 
The ensuing dynamics of 9 corresponds to the diffusion approximation, where the quantum problem 
is replaced with a random walk of step-size of order W . On a configuration space consisting of 
N sites, such a random walk will reach an equilibrium beyond time scales (N/W) 2 . Here r? -1 
plays essentially the role of time t, so that in this dynamical picture equilibrium is reached for 
t ~ r/^ 1 S> (N/W) 2 . Figure [2TT1 illustrates this diffusive spreading of the profile for different values 
of 77. 




-N/2 



N/2 -N/2 



N/2 



Figure 2.1: A plot of the diffusion profile function at five different values of 77, where the argument 
x ranges over the torus T. Left: the graph x *— > r\Q x (see (|2.47p for the choice of normalization). 
Right: the graph x 1— >• log 9 X . Here we chose N = 25W and r\ = 5~ k for k = 1,2, 3, 4, 5. The cases 
k = 1, 2, 3 (where 77 > (W/N) 2 ) are drawn using dashed lines, the case k = 4 (where rj = (W/N) 2 ) 
using solid lines, and the case k = 5 (where rj < (W/N) 2 ) using dotted lines. 



One important consequence of Theorem 12.101 is that it proves derealization for band matrices 
with band width W > iV 4/5 (see Corollary 2.3 of [33] for the precise statement). This improves 
the earlier result from [271128] where derealization for W S> iV 6 / 7 was proved with very different 
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methods. We remark that from the other side it is known that narrow band matrices with W -C 
are in the localized regime [77]. The conjectured threshold for the phase transition is W ~ vN, 
see [5T] . 



2.4 Proof of the local semicircle law without using the spectral gap 

In this section we sketch the proof of a weaker version of Theorem 12.31 namely we replace threshold 
rjE with a larger threshold tje defined as 

VE := minj, : — < minj ^ , ^ ^—^ j for all z E [E + irj, E + lOi] j . (2.48) 

This definition is exactly the same as (|2.17p . but T is replaced with the larger quantity T, in other 
words we do not make use of the spectral gap in S. This will pedagogically simplify the presentation 
and in Section T2.5I we will comment on the modifications for the stronger result. 

We recall that here is no difference between T and T away from the edges (both are of order 
1), so readers interested in the local semicircle law only in the bulk should be content with the 
simpler proof. Near the spectral edges, however, there is a substantial difference. Note that even in 
the Wigner case (see (|2.14p ). t)e is much larger near the spectral edges than the optimal threshold 
VE ~ 1/N. 

Definition 2.11. We call a deterministic nonnegative function ^ = ^>^ N \z) an admissible control 
parameter if we have 

cM" 1/2 < * < M~ c 

for some constant c > and large enough N. Moreover, we call any (possibly iV-dependent) subset 

D = C {z : \E\ < 10, ?7 ^ M^ 1+7 } 

a spectral domain. 

In this section we will mostly use the spectral domain 

S := [z : \E\ ^ 10, r? G [r? B ,10]}. 
Define the random control parameters 
A Q := max|Gjj| , := max\Gu — m\, A := max(A ,Ad), © : = \m^ — m\. (2.49) 

In the typical regime we will work, all these quantities are small. The key quantity is A and we 
will develop an iterative argument to control it. The first step is an apriori bound: 

Proposition 2.12. We have A -< Af~ 7 / 3 r _1 uniformly in S. 

The main estimate behind the proof of Theorem 12.31 for n ^ t)e is the following iteration 
statement: 

Proposition 2.13. Let ^ be a control parameter satisfying 

cM~ 1/2 < * < Af-^r -1 . (2.50) 
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and fix e G (0,7/3). Then on the domain S we have the implication 

A h * =► A H (2.51) 

where we defined 

The proofs of these two propositions are postponed, we first complete the proof of the local 
semicircle law. 

It is easy to check that, on the domain S, if satisfies (|2.50p then so does F(^f). We may 
therefore iterate (|2.51l) . This yields a bound on A that is essentially the fixed point of the map 

i — y which is given by II, defined in (|2.18p . (up to the factor M £ ). More precisely, the 

iteration is started with := M _ " 3 r _1 ; the initial hypothesis A -< is provided by Proposition 
12.121 For k ^ 1 we set ^ k+ i : = F^k)- Hence from (|2.5ip we conclude that A -< ^ fc for all k. 
Choosing k ■= |~£ -1 ] yields 

/ Im m M £ 
H V Mr] + M^' 

Since e was arbitrary, we have proved that 

A -< n, (2.52) 

which is (I2TT9D . 

To prove (|2.20p . i.e. to estimate G, we rewrite f|2.35 j> as 

- Y] s ik v k + Ti = — — , (2.53) 

k 1 

and expand the right hand side. Since \m\ c and \vi\ ^ A, the expansion is possible on the event 
where A< 1, which occurs with very high probability by Proposition 12.121 On this event we get 



m 2 



(-J2s ik v k + r^j = - Vl + 0(A 2 ). (2.54) 



Averaging in (|2.54p yields 

m 2 (-H + [T]) = -[ V ] + 0^(A 2 ). (2.55) 

We will show in Lemma 12.151 in the next section that |Tj| -< II, but in fact the average [T] is 
one order better. This is due to the fluctation averaging phenomenon, and we have 

Proposition 2.14. Suppose that A Q -< ^ for some deterministic control parameter *$> satisfying 
M-V 2 < * < M~ c . Then [T] = 0^ 2 Q ). 

We will explain the proof in Section 12.4.41 Using this proposition and (|2.52p . we get 

[ v ] = m 2 [v] +C^(n 2 ). 

Therefore 

n 2 / Imm 1 \ 2 / T \ 2 C 

^ |1 -m 2 \ ^ V|l-m 2 | + \l-m 2 \Mr]J Wq ^ \ + Mr] J ~Mr~] ^ Mr]' 
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Here in the third step we used the elementary explicit bound Imm ^ C|l — m |, and the bound 
r ^ |1 — m?\~ l which follows from the definition of V by applying the matrix (1 — m 2 S)~ l to the 
constant vector. The last step follows from the definition of S. Since = |[w]|, this concludes the 
proof of (|2.20p . and hence of Theorem 12.31 in the regime S, i.e. for rj ^ t)e- D 

In the next sections we explain the proofs of the three propositions used in this argument. We 
first control the off-diagonal elements, i.e. A Q , then we turn to the proof of Propositions 12.121 12.131 
and EH 



2.4.1 Basic estimates for A n and T, 



Lemma 2.15. The following statements hold for any spectral domain D and admissible control 
parameter ^ . If A -< ^ then 



Ao + ITil -< 



I Im m + 
Mr/ 



Moreover, for any fixed (N -independent) r\ > we have 

A + \Ti\ -< M- 1 ' 2 
uniformly in z S {w £ D : Imw = i]}. 

We remark that we could have written (|2.56p as 



(2.56) 



(2.57) 



Ao + ITil ~< 



' Im m + A 



Mr] ' 



(2.58) 



but this formulation, while it carries the essence, is literally incorrect since it holds only if A -< M c 
has been apriori established. 

Proof of Lemma \2.15\ We first observe that A -< ^ <C 1 and the positive lower bound |m(^)| ^ c 
implies that 



\Gi. 



~< 1. 



(2.59) 



A simple iteration of the expansion formulas (|2.27p concludes that 



IG^H*, for^j, \Gjp\ -< 1, 



id T) i 



h i 



(2.60) 



for any subset T of fixed cardinality. 

We begin with the first statement in Lemma 12.151 First we estimate Zi, which we split as 



\Zi\ < 



+ 



(2.61) 



We estimate each term using Theorem 12. 91 by conditioning on and using the fact that the family 
(hik)k=i i s independent of By (|2.3ip the first term of (|2.6ip is stochastically dominated by 



r (i) 



„2 |/oW|2 
s ik\ LT kk\ 



1/2 



M- l '\ 
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where (|2.60p . (|2.ip and ()2.3p were used. For the second term of (|2.6ip we apply Theorem 12.91 (ii) 
with a ki = s][ 2 G^i s][ 2 and X k = ( ik (see (|23|) ). We find 

Ai . / . / ft 

where in the last step we used (|2.27p and the estimate 1/Ga -< 1. Thus we get 



/ Im m + ^ , 

where we absorbed the bound M" 1 / 2 on the first term of (|2.6ip into the right-hand side of (|2.63p . 
using Imm ^ crj as follows from an explicit estimate. 

Next, we estimate A c . We can iterate (|2.28p once to get, for i ^ j, 

(i) / (v) \ 

Gij = -G tl hikG k l j = -G U G® hj - £ h lk G k fh l3 . (2.64) 

k V k,l J 

The term hij is trivially 0_<(M — 1 ' 2 ). In order to estimate the other term, we invoke Theorem [2~ 
(iii) with a ki = s\£ 'G k l f s^ 2 , X k = Q k , and Yj = Cij- As in (I2.62j) . we find 

l^feOl 2 Imm + * 

2^«ifc | tr w I *y -< ^ , 

and thus 



/ Im m + ^ , s 

where we again absorbed the term hij -< M -1 / 2 into the right-hand side. 

In order to estimate Ai and ha in the definition of Tj, we use ()2.60p to estimate 



i . I ., . . o , ,_i/2 „ » ^ /Imm Imm + A 

where the second step follows from Imm ^ cr/. Collecting (I2.63H . (I2.65p . this completes the proof 
of (f236l) . 

The proof of (12.57|) is almost identical to that of (I2.56|) . The quantities \G^ kk \ and are 
estimated by the trivial deterministic bound r/ _1 = 0(1). We omit the details. □ 



2.4.2 Sketch of the proof of Proposition |2"?T2"1 

The core of the proof is a continuity argument. Its basic idea is to establish a gap in the range of 
A by proving 

Claim 1. On the event A ^ M~ 7//4 r -1 we actually have the stronger bound A -< M~ 7 / 2 r~ 1 . 

In other words, for all z G S, with high probability either A ^ M~ 7 / 2 r _1 or A ^ M _7 / 4 r _1 . The 
second step is to show that A ^ M _7 / 2 r~ 1 holds for z with a large imaginary part rj: 
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Claim 2. We have A -< M" 1 / 2 uniformly in z G [-10, 10] + 2i. 

Thus, for large r\ the parameter A is below the gap. Using the fact that A is continuous in 
r] = Imz and hence cannot jump from one side of the gap to the other, we then conclude that A is 
below the gap for all z £ S and this is Proposition 12.121 See Figure 12.21 for an illustration of this 
argument. 



M -7/4 r - 



M-T/ 2 r- 




Ve 



Figure 2.2: The (17, A)-plane for a fixed E with the graph of rj — > A(E + irj). The shaded region is 
forbidden with high probability by Claim 1. The initial estimate, Claim 2, is marked with a black 
dot. The graph of A = A(E + i^) is continuous and lies beneath the shaded region. Note that this 
method does not control A{E + irj) in the regime rj $J rjE- 

Now we explain Claim 1. We will work on the event A M _ " 4 r _1 ^ M~ c , where we may 
invoke (|2.56p to estimate A D and Tj. In order to estimate A^, we expand the right-hand side of 
(|2.53p in Vi to get (|2.54p . Using (|2.56p to estimate Tj, we therefore have 



Vi - m 2 ^2 SikVk = [a 2 + y Im ^ A ) • (2-66) 

We write the left-hand side as [(1 — m 2 S)v]i with the vector v = {vi)f =1 . Inverting the operator 
1 — m 2 S, we therefore conclude that 

„ / . 9 / Im m + A \ , 

A d = max|^| ~< r(A 2 + J j. (2.67) 

Together with (12.560 and T ^ c, we therefore get 

^ / . 9 / Im m + A \ . 

On the event A ^ M _7 / 4 r _1 we may estimate 

TA 2 < M -7/2 r -1 . 
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Moreover, by definition of S, we have r 3 n ^ M 7 . Using the definition of II (|2.18p . we therefore 
get 



' Im m + A 
Mrj 



< rn + rVr-in < Cikr 7/2 r~ 



Plugging this into ([2768)) yields A -< M~ 7 / 2 T~ l , which is Claim 1. 
Finally, we explain Claim 2. We write (|2.53p as 

Sfc s ikVk + Tj) 



(2.69) 



In the regime rj = 2, all resolvent entries are bounded by 1/2, thus ^ 1. Since |Tj| -< M 1 I 2 by 
(I237jl and jm" 1 ] ^ 2 we find 



??? 



1 + S ^s ik v k - T, 



^ 1 + C^(M" 



-l/2> 



Using |m| ^ 1/2 we therefore conclude from (|2.69p that 

A,, < 



2 + C^(M~ 1 /2) 2 

i.e. Ad = 0^(M -1 / 2 ). Together with the estimate on A c from ([2.57)1 . we obtained Claim 2. □ 
2.4.3 Sketch of the proof of Proposition [27131 

This argument is very similar to the proof of Claim 1 in Section 12.4.21 We can work on the event 
A ^ M _7//4 , then the bound (|2.56|) is available to estimate A G and Tj . Next, we estimate A^. We 
expand the right-hand side of (|2.53|) . we get 



+ CVA 2 • 



Using the fluctuation averaging estimate (|2.75p explained in the next section, as well as (|2.56p . we 
find 



A rf -< r^ 2 + 



'imm + $ 

m7] 



(2.70) 



which, combined with (|2.56p . yields 

a -< rv 2 -\ 



Im m + \l/ 



Mr] 



'Imm M e jp{j\ 
~MrJ + M~r~ } = ( * ) ' 



where in the last step we used the assumption VI/ ^ M 7 / 3 r 1 and e ^ 7/3. 



(2.71) 
□ 
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2.4.4 Fluctuation averaging: proof of Proposition 12.141 

The leading error in the self-consistent equation (|2.54|) for Vi is Tj. Among the three summands of 
Tj, see (|2.36p . typically Z{ is the largest, thus we typically have 

Vi -< Zi (2.72) 

in the regime where T is bounded. The large deviation bounds in Theorem 12.91 show that Zi -< A Q 
and a simple second moment calculation shows that this bound is essentially optimal. On the other 
hand, ()2.29|) shows that the typical size of the off-diagonal resolvent matrix elements is at least of 
order (iVr/) -1 / 2 , thus the estimate 

Zi -< A a -< 

is essentially optimal in the standard Wigner case (M = N). Together with (|2.72|) this shows that 
the natural bound for A is (Nrj)^ 1 ^ 2 , which is also reflected in the bound (|2.19p . 

However, the bound (|2.2U|) for the average, m^ — m = [v], is of order A 2 ~ (Nn) -1 , i.e. it is one 
order better than the bound for Vi. For the purpose of [v], it is the average [T] of the leading errors 
Tj that matters, see (|2.55j) . Since Zi, the leading term in Tj, is a fluctuating quantity with zero 
expectation, the improvement comes from the fact that fluctuations cancel out in the average. The 
basic example of this phenomenon is the central limit theorem. In our case, however, Zi are not 
independent. In fact, their correlations do not decay, at least in the Wigner case where all indices i 
play symmetric role. Thus standard results on central limit theorems for weakly correlated random 
variables do not apply. 

Here we formulate a version of the fluctuation averaging mechanism, taken from [30], that is 
the most useful for this discussion and we comment on the history afterwards. 

We shall perform the averaging with respect to a family of weights T = (tik) satisfying 

< t ik < M- 1 , J2t ik = 1. (2.73) 

k 

Typical example weights are = Sik and tik = A^ 1 . Note that in both of these cases T commutes 
with S. 

Theorem 2.16 (Fluctuation averaging). Fix a spectral domain D and deterministic admissible con- 
trol parameters ^f,^ ^ M~ c . Suppose that A -< ^f, A -< fy and the weight T = (tik) satisfies 
(|2.73p . Then we have 

J>fcQfc^- = O^l), J^^QkGkk = O^ 2 ). (2.74) 

k kk k 

If in addition T commutes with S then 

Y^kkVk = Ojr* 2 ), J2ti k (v k - M) = O^f* 2 ). (2.75) 

k k 

The estimates (I2.74p and (|2.75p are uniform in the index i. 

The first version of the fluctuation averaging mechanism appeared in [16] for the Wigner case, 
where [Z] = N~ l J2 k Zk was bounded by A 2 . Since QkiGkk]" 1 is essentially Zk, see (I2.26H . this 
corresponds to the first bound in (|2.74p . A different proof (with a better bound on the constants) 
was given in |47j . A conceptually streamlined version of the original proof was extended to sparse 
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matrices [31] and to sample covariance matrices |73j . Finally, an extensive analysis in [29] treated 
the fluctuation averaging of general polynomials of resolvent entries and identified the order of 
cancellations depending on the algebraic structure of the polynomial. Moreover, in an additional 
cancellation effect was found for the quantity Qi\Gij\ 2 . This improvement plays a key role in the 
proof of Theorem EE1 see (|21SJ) . 

All proofs of the fluctuation averaging theorems rely on computing expectations of high moments 
of the averages and carefully estimating various terms of different combinatorial structure. In |29] 
we have developed a Feynman diagrammatic representation for bookkeeping the terms, but this is 
necessary only for the case of general polynomials. For the special cases stated in Theorem 12.161 
the proof is relatively simple and it is presented in Appendix B of |30| . Here we will not repeat 
the proof, we only indicate the main mechanism by estimating the second moment of the first term 
in (I2.74p . The actual proof requires estimating all moments and then use Chebysev inequality to 
translate the moment estimates to probabilistic estimates standing behind the notation 0^(^ 2 ) in 

flZZZH) . 

Second moment calculation. First we claim that 

1 



Qh 



Gkk 



~< *o- (2.76) 



Indeed, from Schur's complement formula (|2.26p we get \Qk{Gkk) 1 | ^ l^fefcl + \^h\- The first term 
is estimated by \hkk\ ~< M^ 1 / 2 ^ $? . The second term is estimated exactly as in (|2.6ip and (|2.62p . 
giving \Zk\ -< In fact, the same bound (12.760 holds if Gkk is replaced with G kk as long as |T| 
is bounded. 

Abbreviate Xk ■= Qk(Gkk)~ l and compute the variance 

2 

K^UkXk = ^t tk t u EX k X] = ^tlEX k X~ k + Y / tikt i ^X k Xi. (2.77) 

Using the bounds (12.730 on tik and (12.760 . we find that the first term on the right-hand side of 
(|2.77p is 0^(M^ 1 ^/ 2 ) = O^(^o), where we used that ^ is admissible. Let us therefore focus on 
the second term of (|2.77|) . Using the fact that k ^ /, we apply (|2.27|) to Xk and Xi to get 



We multiply out the parentheses on the right-hand side. The crucial observation is that if the 
random variable Y is independent of i (see Definition 12 .7p then E,Qi(X)Y = E,Qi(XY) = 0. Hence 
out of the four terms obtained from the right-hand side of (|2.78p . the only nonvanishing one is 



^trkk^kk^U' KLr ll Lr ll ijr kk / 

where we used that the denominators are harmless, see (|2.59p . Together with (|2.73p . this concludes 
the proof of E|^ fe UkXk\ -< which means that Yl,k ^ikXk is bounded by ^ 2 in second moment 
sense. □ 

Finally, Proposition 12. 14l directly follows from the first estimate in (|2.74p with to- = 1/ /N, since 
from (I2.26P we have 

Qk^~ = h kk - Z k = T k - A k = T k + 0^ 2 ). 

Lrfcfc 

Taking the average over k, we get [T] = 0^(^ 2 ). □ 
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2.5 Remark on the proof of the local semicircle law with using the spectral gap 

In Section 12,41 we proved the local semicircle law, Theorem 12. 3| uniformly for r/ tje instead of 
the larger regime r] ^ tje- The difference between these two thresholds stems from the difference 
between T and f , see (pUT)) and (ggHj) . 

The bound V on the norm of (1 — m 2 S)~ l entered the proof when the self-consistent equation 
(|2,66p was solved. The key idea is to solve the self-consistent equation (|2.66p separately on the 
subspace of constants (the span of the vector e) and on its orthogonal complement e 1 - . Roughly 
speaking, we obtain 



K - HI -< r(A 2 + r(A)) , r(A) := W^^, (2-79) 

instead of (|2.67p . In fact, we can improve this by using the fluctuation averaging. Subtracting the 
average over i from the self-consistent equation (j2.54p and estimating \m\ 2 ^ C, we have 



Ml < c 



-M)-( T *-[ T 



+ 0^(A 2 ) -< TA 2 + r(A) , (2.80) 



where in the last step we used the fluctuation averaging estimate (|2.75|) with Sj& = to,, and |Tj| -< 
r(A) from ([236]) . 

On the space of constant vectors, (|2.53|) becomes a scalar equation for the average [v], which 
can be expanded up to second order. More precisely, assuming \vi\ <C 1, we can expand (|2.53p up 
to second order: 

- E + Ti = + i^ 2 + °( a3 ) • ( 2 - 81 ) 



k 



In order to get a closed equation for [v], we take the average over i: 

(l_ m 2 )M _ m -l^^ = _ m 2 [T]+0(A3) _ (282) 

i 

The nonlinear term is estimated by 

^E^ 2 = H 2 + ^E(«*-H) 2 = M 2 + o^((rA 2 + r(A)) 



where ()2.80p was used. The average [T] can be estimated by 0^(A 2 ) as in Proposition 12.1^1 
Moreover, A D is estimated by r(A) as in Lemma 12.151 We thus obtain a quadratic equation for the 
scalar quantity [v]: 

[l-m 2 )[v] -m-^v} 2 = 0^(A 3 + (TA 2 + r(A)) 2 ) . (2.83) 



The main control parameter in this proof is = | [v] \ , and the key iterative scheme is formulated in 
terms of 0. However, many intermediate estimates still involve A. In particular, the self-consistent 
equation (I2.53P is effective only in the regime where Vi is already small and in the calculation above 
we tacitly used that \vi\ <C 1. Hence we need a preparatory step to prove an apriori bound on A, 
essentially showing that A <^ 1, in fact we will need A <^ T" 1 (compare with Proposition 12.12"]) . 
This proof itself is a continuity argument similar to the proof of Proposition 12.12] now, however, we 
have to follow A and in tandem. The main reason why is already involved in this part is that 
we work in larger spectral domain i] TjE, defined by using T. Thus, already in this preparatory 
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step, the self-consistent equation has to be solved separately on the subspace of constants and its 
orthogonal complement. We will omit here these details. 

The second preparatory step is to control A in terms of O, which allows us to express the error 
terms in the self-consistent equation (|2,83p in terms of G = | [v] |. We first notice that (|2.80p implies 

\vi\ H 9 + fA 2 + r(A), 

i.e. 

A -< 9 + f A 2 + r(A). 
Using the apriori bound A <C we get 

A -< + r(A). 

This equation is analogous to (|2.5ip and can be iterated as in the application of in Proposition 12. 131 
leading to (|2.52p to obtain 

„ Imm 1 
A-<9 + , ^^ + t^. 2.84 

V Mr] Mr] v ; 

Plugging this bound into (|2.83p . we have a self-consistent equation for the scalar quantity [v] since 
= \ [v]\. An elementary calculation, using the apriori bound A <C yields 



(1 - m 2 )H - m'M 2 = ^ (p(6) 2 + M-^6 2 ), p(9) := J + ^ (2-85) 

Finally, in the main step we solve the quadratic inequality (|2.85|) for 0. If we neglect the error 
term in (|2.85j) . then the equation reduces to 

[l-m 2 )[v] = m 3 [v} 2 , (2.86) 

which has two solutions: either [v] = or [v] = (1 — m 2 )/m 3 . Away from the spectral edge we have 
|1 — m 2 \ ^ c with some positive constant c, so the two solutions are separated and they both are 
stable under small perturbations. The second solution would mean that [v] is strictly separated 
away from zero. But this can be excluded by a continuity argument: for large rj, say rj = 2, it is 
easy to prove that [v] is small. Since [v] is a continuous function of rj, we find that as r] decreases 
continuously, [v] cannot suddenly jump from a value near zero to a value near (1 — ?n 2 )/m 3 . Thus 
[v] must remain in the vicinity of the zero solution to (|2.86|) . This completes the sketch of the proof 
of Theorem 12.31 for the more general case rj ^ t]e- D 



3 Universality of the correlation functions for Wigner matrices 

In this section we explain the sketch of the proof of Theorem 11.21 

3.1 Dyson Brownian motion and the local relaxation flow 
3.1.1 Concept and results 

The Dyson Brownian motion (DBM) describes the evolution of the eigenvalues of a Wigner matrix 
as an interacting point process if each matrix element hij evolves according to independent (up 
to symmetry restriction) Brownian motions. We will slightly alter this definition by generating 
the dynamics of the matrix elements by an Ornstein-Uhlenbeck (OU) process which leaves the 
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standard Gaussian distribution invariant. In the Hermitian case, the OU process for the rescaled 
matrix elements Qj := N l l 2 hij is given by the stochastic differential equation 

d(ij = dfcj - -djdt, i,j = 1, 2, . . . N, (3.1) 

where fyj, i < j, are independent complex Brownian motions with variance one and fin are 
real Brownian motions of the same variance. Denote the distribution of the eigenvalues A = 
(Ai, A2, . . . , \n) of H t at time t by /^(A)/i(dA) where (i is given by (jl.lOp with the Gaussian poten- 
tial V(x) = x 2 /2. 

Then f t = ft,N satisfies [21] 

dtft = Sff u (3.2) 

where 




The parameter f3 is chosen as follows: f3 = 2 for complex Hermitian matrices and j3 = 1 for 
symmetric real matrices. Our formulation of the problem has already taken into account Dyson's 
observation that the invariant measure for this dynamics is fj,. A natural question regarding the 
DBM is how fast the dynamics reaches equilibrium. Dyson had already posed this question in 1962: 

Dyson's conjecture [24] : The global equilibrium of DBM is reached in time of order one and the 
local equilibrium (in the bulk) is reached in time of order 1/N. Dyson further remarked, 

"The picture of the gas coming into equilibrium in two well-separated stages, with micro- 
scopic and macroscopic time scales, is suggested with the help of physical intuition. A 
rigorous proof that this picture is accurate would require a much deeper mathematical anal- 
ysis. " 

We will prove that Dyson's conjecture is correct if the initial data of the flow is a Wigner 
ensemble, which was Dyson's original interest. Our result in fact is valid for DBM with much 
more general initial data that we now survey. Briefly, it will turn out that the global equilibrium 
is indeed reached within a time of order one, but local equilibrium is achieved much faster if an 
a-priori estimate on the location of the eigenvalues (also called points) is satisfied. To formulate 
this estimate, let 7j = 7^ denote the location of the j-th point under the semicircle law, i.e., jj 
is defined by (l2T2Tj) . 

A-priori Estimate: There exists an a > such that 

1 r N 

Q = Q a := sup -/ V(A i - 7j ) 2 /t(AMdA) 5 :CiV- 1 - 211 (3.4) 

t>N -2* Jy J 

with a constant C uniformly in N. This condition first appeared in [40j . 

The main result on the local ergodicity of Dyson Brownian motion states that if the a-priori 
estimate (|3.4p is satisfied then the local correlation functions of the measure ftn are the same as 
the corresponding ones for the Gaussian measure, \i = fooL 1 , provided that t is larger than N~ 2a . 
The n-point correlation functions of the probability measure ftdfi are defined, similarly to (jl.4[) . by 

p i f ) N (xi,x 2 , . . . ,x n ) = I / t (x)/x(x)dx n+ i . . .dx N , x= (x 1 ,x 2 ,...,x N ). (3.5) 
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Due to the convention that one can view the locations of eigenvalues as the coordinates of particles, 
we have used x, instead of A, in the last equation. From now on, we will use both conventions 
depending on which viewpoint we wish to emphasize. Notice that the probability distribution of 
the eigenvalues at the time t, ft/J-, is the same as that of the Gaussian divisible matrix: 

H t = e^ 2 H + (1 - e"') 1/2 U, (3.6) 

where Hq is the initial generalized Wigner matrix and U is an independent standard GUE (or 
GOE) matrix. This establishes the universality of the Gaussian divisible ensembles. The precise 
statement is the following theorem: 

Theorem 3.1. \41\ Theorem 2.1] Suppose that the a-priori estimate (|3.4p holds for the solution 
ft of the forward equation (|3,2|) with some exponent a > 0. Let E e (—2,2) and b > such that 
[E — b, E + b] C (—2, 2) . Then for any s > 0, for any integer n ^ 1 and for any compactly supported 
continuous test function O : W 1 — > K, we have 



f E+b dE' f 
lim sup / —— / d«i . . . da n 0(a\, 



y JuWr ~ ( £ + sum- ■- E+ w^Se)) ~ °- 

(3.7) 

We can choose b = b^ depending on N. In [H] explicit bounds on the speed of convergence and 
the optimal range of b were also established. In particular, thanks to the optimal rigidity estimate 
(|2.22p which implies that (|3.4p holds with any o < 1/2, the range of the energy averaging in (|3.7p 
can be reduced to b N ^ iV" 1+5 , £ > 0, but only for t ^ N~^ 8 (Theorem 2.3 of [47]). 

Theorem 13.11 is a consequence of the following theorem which identifies the averaged gap distri- 
bution of the eigenvalues. 

Theorem 3.2 (Universality of the Dyson Brownian motion for short time). \41[ Theorem Jj..l] 
Suppose (3^1 and let O : R —> M. be a smooth function with compact support. Then for any 
sufficiently small e > 0, independent of N , there exist constants C, c > 0, depending only on e and 
O such that for any J C {1, 2, . . . , N — 1} we have 



J prE ^-^))^-/ ±^0(N( Xi -x i+l ))d»\^CN £ J^ + Ce- cNe . (3.8) 

In particular, if the a-priori estimate (13.4p holds with some a > and \J\ is of order N, then for 
any t > N~ 2a + Se the right hand side converges to zero as N — > oo ; i.e. the gap distributions for 
ftdfi and dji coincide. 

The test functions can be generalized to 

0[N(xi - Xi+i), N(xi+i - x i+2 ), ... , N(x i+n -i - x i+n j^ (3.9) 

for any n fixed which is needed to identify higher order correlation functions. In applications, J 
is chosen to be the indices of the eigenvalues in the interval [E — b, E + b] and thus |J| ~ Nb. 
This identifies the averaged gap distributions of eigenvalues and thus also identifies the correlation 
functions after energy averaging. We will not explain here in detail how to pass information from 
gap distribution to correlation functions (see Section 7 in [H]), but we note that this transfer is 
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relatively easy if both statistics are averaged on a scale larger than the typical fluctuation of a single 
eigenvalue (which is smaller than iV~ 1+e in the bulk by (|2.22p ). This concludes Theorem 13. 11 Note 
that the input of this theorem, the apriori estimate (|3.4p . identifies the location of the eigenvalues 
only on a scale jV -1 / 2- " which is much weaker than the 1/N precision for the eigenvalue differences 
in ([MD - 

By the rigidity estimate (|2.22p , the a-priori estimate (|3.4p holds for any a < 1/2 if the initial data 
of the DBM is a generalized Wigner ensemble. Therefore, Theorem 13.21 holds for any t ^ N~ 1+£ 
for any e > and this establishes Dyson's conjecture for any generalized Wigner matrices. 



3.1.2 Main ideas behind the proof of Theorem 13.21 

The key method is to analyze the relaxation to equilibrium of the dynamics (|3.2p . This approach 
was first introduced in Section 5.1 of [3D]; the presentation here follows [41] , 

We start with a short review of the logarithmic Sobolev inequality for a general measure. Let 
the probability measure /i on W N be given by a general Hamiltonian T-L: 

e -NH(x) 

dM(x) = z dx, (3.10) 

In applications fi will be the Gaussian equilibrium measure, (fTTTDl) with V(x) = x 2 /2, so we use the 
same notation fj,, but the statements in the beginning of this section hold for a general measure. Let 
Jzf be the generator, symmetric with respect to the measure d//, defined by the associated Dirichlet 
form 

D(f) = D„(f) = - f f^fdfi := / (^'/) 2 dM, dj = d Xj . (3.11) 

3 J 

Recall the relative entropy of two probability measures: 



If dv = /d/i, then we will sometimes use the notation S^(f) := S(ffJ,\fJ,). The entropy can be used 
to control the total variation norm via the well known inequality 

J |/-l|d/x^2^(/). (3.12) 

Let ft be the solution to the evolution equation 

dtft = Sef u t > 0, (3.13) 
with a given initial condition /q. The evolution of the entropy Sa(ft) = S(ftn\n) satisfies 

dtS^ft) = -4D M (V^). (3.14) 
Following Bakry and Emery [6J, the evolution of the Dirichlet form satisfies the inequality 

dtD^yJt) < j (Vy/J(V 2 H)V/f,d/i. (3.15) 

If the Hamiltonian is convex, i.e., 

V 2 ^(x) =Hess^(x) ^ w forallxGM^ (3.16) 
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with some constant w > 0, then we have 

dtD^y/ft) < -wD^^/Jt). (3.17) 

Integrating (13.140 and (13.170 back from infinity to 0, we obtain the logarithmic Sobolev inequality 
(LSI) 

and the exponential relaxation of the entropy and Dirichlet form on time scale t ~ 1 /w 

S^Ut) < e~ to ^(/o), ^(V 7 ^) < e^D^(^T ). (3.19) 

As a consequence of the logarithmic Sobolev inequality, we also have the concentration inequality 
for any k and a > 

W (\x k - E M (x fe )| > a) < 2e - roiVa2 / 2 . (3.20) 

We will not use this inequality in this section, but it will become important in Section [4.11 

Returning to the classical ensembles, we assume from now on that T~L is given by (jl.lOp with 
V(x) = x 2 /2 and the equilibrium measure \x is the Gaussian one. We then have the convexity 
inequality 

v.V^^llvf + lvjI^^llvf, v £ R». (3.2!) 

i<j v 1 ■>' 

This guarantees that fj, satisfies the LSI with zu = 1/2 and the relaxation time to equilibrium is of 
order one. 

The key idea is that the relaxation time is in fact much shorter than order one for local observ- 
ables that depend only on the eigenvalue differences. Equation (|3.2ip shows that the relaxation 
in the direction m — vj is much faster than order one provided that close. However, 

this effect is hard to exploit directly due to that all modes of different wavelengths are coupled. 
Our idea is to add an auxiliary strongly convex potential W(x) to the Hamiltonian to "speed up" 
the convergence to local equilibrium. On the other hand, we will also show that the cost of this 
speeding up can be effectively controlled if the a-priori estimate (|3.4p holds. 

The auxiliary potential W(x) is defined by 



N 1 

W(jl) -^Wjfa), Wj(x) := —{ Xj - 7i ) 2 , (3.22) 

i=i T 

i.e. it is a quadratic confinement on scale y/r for each eigenvalue near its classical location, where 
the parameter r > will be chosen later. The total Hamiltonian is given by 

U ■= U + W, (3.23) 

where T~L is the Gaussian Hamiltonian given by (jl.lOp . The measure with Hamiltonian H, 

du := a/(x)dx, u := e~ Nfi /Z, (3.24) 

will be called the local relaxation measure. 

The local relaxation flow is defined to be the flow with the generator characterized by the natural 
Dirichlet form w.r.t. to, explicitly, Jzf : 

2 = ^~Y1 h o d v b J = w j( x j) = Xj ~ 73 ■ ( 3 - 25 ) 

3 
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We will choose r <C 1 so that the additional term W substantially increases the lower bound (|3,16p 
on the Hessian, hence speeding up the dynamics so that the relaxation time is at most r. 

The idea of adding an artificial potential W to speed up the convergence appears to be unnatural 
here. The current formulation is a streamlined version of a much more complicated approach 
that appeared in [ID] and which took ideas from the earlier work |36j. Roughly speaking, in 
hydrodynamical limit, the short wavelength modes always have shorter relaxation times than the 
long wavelength modes. A direct implementation of this idea is extremely complicated due to the 
logarithmic interaction that couples short and long wavelength modes. Adding a strongly convex 
auxiliary potential W(x) shortens the relaxation time of the long wavelength modes, but it does 
not affect the short modes, i.e. the local statistics, which are our main interest. The analysis of 
the new system is much simpler since now the relaxation is faster, uniform for all modes. Finally, 
we need to compare the local statistics of the original system with those of the modified one. It 
turns out that the difference is governed by (VW) 2 which can be directly controlled by the a-priori 
estimate (|3.4jl . 

Our method for enhancing the convexity of % is reminiscent of a standard convexification idea 
concerning metastable states. To explain the similarity, consider a particle near one of the local 
minima of a double well potential separated by a local maximum, or energy barrier. Although the 
potential is not convex globally, one may still study a reference problem defined by convexifying 
the potential along with the well in which the particle initially resides. Before the particle reaches 
the energy barrier, there is no difference between these two problems. Thus questions concerning 
time scales shorter than the typical escape time can be conveniently answered by considering 
the convexified problem; in particular the escape time in the metastability problem itself can be 
estimated by using convex analysis. Our DBM problem is already convex, but not sufficiently 
convex. The modification by adding W enhances convexity without altering the local statistics. 
This is similar to the convexification in the metastability problem which does not alter events before 
the escape time. 

3.1.3 Some details on the proof of Theorem 13.21 

The core of the proof is divided into three theorems. For the flow with generator Jzf, we have the 
following estimates on the entropy and Dirichlet form. 

Theorem 3.3. Consider the forward equation 




and the logarithmic Sobolev inequality 




(3.29) 



with a universal constant C . Thus the relaxation time to equilibrium is of order t: 





(3.30) 
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Proof. Denote by h = y/q and we have the equation 



dMh) = d t — I (Vhy e -» H dx ^ 



2N 



Vh(V z n)Vhe- Nn dx. 



(3.31) 



(3.32) 



In our case, (|3.2ip and (|3.22p imply that the Hessian of H is bounded from below as 

vh(v 2 n)vh > £ Y.^ h ) 2 + 2^ E (x .- x .)2 & h ~ d ^) 2 

with some positive constant C. This proves (|3.27p and (|3.28p . The rest can be proved by straight- 
forward arguments analogously to (|3.14fl — (|3.19|) . □ 

Notice that the estimate (|3.28p is an additional information that we extracted from the Bakry- 
Emery argument by using the second term in the Hessian estimate (|3.2ip . It plays a key role in 
the next theorem. 

Theorem 3.4 (Dirichlet form inequality). Let q be a probability density J qdu; = 1 and let O : R — > R 
be a smooth function with compact support. Then for any Jc{l, 2, . . . ,N — 1} and any t > we 
have 

J TjiY.OiNfa-XH.^qdu-J ^^(iV^-T^d^C^^^ 

(3.33) 

Proof. For simplicity, we assume that J = {1, 2, . . . , N — 1}. Let qt satisfy 

d t q t = &q u t > 0, 



with an initial condition qo = q. We write 



(3.34) 



ieJ 



/ [rjiE ^-^))] (9-9*)^ + / [^E ^ 



Xi - x i+ i)) {q t - l)dw. 



The second term in (|3.34p can be estimated by (|3.12p . the decay of the entropy (|3.30p and the 
boundedness of O; this gives the second term in (I3.33p . 

To estimate the first term in (|3.34p . by the evolution equation dqt = ££qt and the definition of 
Jzf we have 

/ tj* y~"p(N(xi - x i+ i))q t du - j t-t ^ Q(N(xj - x i+ i))q duj 
f 1 

ds J jj^y~]0'(N(xj - x i+1 ))[diq s - d i+ xq s ]du. 



From the Schwarz inequality and dq = 2^/qdy/q, the last term is bounded by 
/ ds / TTT? \0'(N(xi - x i+ i)) (x{ - x i+ i) 2 q s dui 

Jo J^wr^r} 1 J 

IS J 

f ds f 4jEt " ^[di^/qZ - di+iyfcufdu 

Jo Jr n n , K x i ~ x i+ iY 



< c 



(3.35) 

\j\ J ' 
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1 2 

where we have used (|3.28p and that 0'(N(xi — Xi+i)) (xi — Xi + i) 2 ^ CN~ 2 due to O being 
smooth and compactly supported. □ 

Alternatively, we could have directly estimated the left hand side of (|3.33j) by using the total 
variation norm between qui and oo, which in turn could be estimated by the entropy (|3.12p and the 
Dirichlet form using the logarithmic Sobolev inequality, i.e., by 

C J \q - l|dw < C^/Sjq) < CyJrDUy/q). (3-36) 

However, compared with this simple bound, the estimate (|3.33|) gains an extra factor \J\ ~ N in 
the denominator, i.e. it is in terms of Dirichlet form per particle. The improvement is due to the 
fact that the observable in (|3.33p depends only on the gap, i.e. difference of points. This allows us 
to exploit the additional term f|3.28|) gained in the Bakry-Emery argument. This is a manifestation 
of the general observation that gap-observables behave much better than point-observables. 

The final ingredient in proving Theorem 13.21 is the following entropy and Dirichlet form esti- 
mates. 

Theorem 3.5. Suppose that ()3.21[) holds. Let a > be fixed and recall the definition of Q = Q a 
from (I3.4p . Fix a constant r ^ N~ 2a and consider the local relaxation measure oj with this r. Set 
ip := u/fi and let gt := ft/ip- Suppose there is a constant m such that 

S{f T io\uj) CN m . (3.37) 

Then for any t ^ tN £ the entropy and the Dirichlet form satisfy the estimates: 

S{g t u\uj) ^ CN 2 Qt~\ DUV9~t) < CN 2 Qt~ 2 (3.38) 

where the constants depend on e and m. 

Proof. The evolution of the entropy S(ftfi\u) = S UJ (gt) can be computed explicitly by the 
formula |8j 



j J J 

Hence we have, by using (|3.25p . 

j J J j J 

Since cj is Jz?-invariant and time independent, the middle term on the right hand side vanishes, and 
from the Schwarz inequality 

d t S(fMuj) ^ -D w (V5t) + CN^ f b 2 g t dio < -D u (y/gd + CN 2 Qt~ 2 . (3.39) 

3 

Notice that f|3.39|) is reminiscent to (|3.14j) for the derivative of the entropy of the measure gtu = ftH> 
with respect to oj. The difference is, however, that gt does not satisfy the evolution equation with 
the generator S£ . The last term in (|3.39p expresses the error. 

Together with the logarithmic Sobolev inequality (|3.29p . we have 

d t S(f tf x\oj) ^ -DUVdl) + CN 2 Qt~ 2 < -Cr^Sift^oj) + CN 2 Qt~ 2 . (3.40) 
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Integrating the last inequality from r to i and using the assumption (|3.37p and t tN £ , we have 
proved the first inequality of (|3.38[) . Using this result and integrating (|3.39p . we have 



j D u (J£)ds ^ CN 2 Qt-\ 



By the convexity of the Hamiltonian, D^y/Jt) is decreasing in t. Since D u (y/~gl) $5 C 'D '^{yffs) + 
CN 2 Qt~ 2 , this proves the second inequality of (I3.38p . □ 

Finally, we complete the proof of Theorem 13.21 For any given t > we now choose r := tN~ £ 
and we construct the local relaxation measure u with this r. Set ijj = u/fi and let q := g% = ft/ip be 
the density q in Theorem 13.41 Then Theorem 13. 5[ Theorem 13.41 and an easy bound on the entropy 
S u (q) < CN m imply that 

/ ^J^O(N(xi-x i+ i))(ftdii-6u)\ ^(^) 1/2 + C ^" cAf£ - (3-41) 



N 2 Q\^ , ^-cN^n.rs N2 Q , ^- c ^ 



i.e., the local statistics of /t/i and w are the same for any initial data f T for which (|3.37p is satisfied. 
Applying the same argument to the Gaussian initial data, /o = f T = 1, we can also compare and 
u>. We have thus proved ()3.8[) and hence the universality. □ 

3.2 The Green function comparison theorems and four moment matching 

We now state the Green function comparison theorem, Theorem 13.61 It will quickly lead to Theo- 
rem 13.71 stating that the correlation functions of eigenvalues of two matrix ensembles are identical 
on a scale smaller than 1/N provided that the first four moments of all matrix elements of these 
two ensembles are almost the same. We will state a limited version for real Wigner matrices for 
simplicity of presentation. 

Theorem 3.6 (Green function comparison). \45[ Theorem 2.3] Suppose that we have two N x N 
Wigner matrices, and H^ w \ with matrix elements h%j given by the random variables N~^' 2 Vij 
and N~ l l 2 Wij, respectively, with Vij and Wij satisfying the uniform subexponential decay condition 
(|1.17p . We assume that the first four moments of and are close to each other in the sense 
that 

|Ev?- - Ewfj] < N~ & - 2+s l 2 , (3.42) 

holds for some 5 > 0. Then there are positive constants C\ and e, depending on $ and Cq from 
(|1.17p such that for any n with N^ 1 ^ 6 ^ rj ^ iV _1 and for any z±, Z2 with Im Zj = ±r], j = 1, 2, we 
have 



lim 



ETvG^ v) (z 1 )TrG^ ) (z 2 ) - ETrG (w) (zi) Tr G (w) (z 2 ) =0, (3.43) 
where and denotes the Green functions of and . 



Here we formulated Theorem 13.61 for a product of two traces of the Green function, but the 
result holds for a large class of smooth functions depending on several individual matrix elements 
of the Green functions as well, see [15] for the precise statement. (The matching condition (|3.42p 
is slightly weaker than in [35] , but the proof in [15] without any change yields this slightly stronger 
version.) This general version of Theorem l3.6l implies the correlation functions of the two ensembles 
at the scale 1/N are identical: 
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Theorem 3.7 (Correlation function comparison). \45\ Theorem 6.4] Suppose the assumptions of 
Theorem \3.6\ hold. Let p„ ju and p^ N be the n— point functions of the eigenvalues w.r.t. the 
probability law of the matrix and H^ w \ respectively. Then for any \E\ < 2, any n ^ 1 and 
any compactly supported continuous test function O : W 1 — > M we have 

lim / da l ...da n O(a 1 ,...,a n )(p^-p^ N )(E + ^-,...,E + ^) = 0. (3.44) 

Notice that these comparison theorems hold for any fixed energy E, i.e. no averaging in energy 
is necessary. 

The basic idea for proving Theorem 13.61 is similar to Lindeberg's proof of the central limit 
theorem, where the random variables are replaced one by one with a Gaussian one. We will replace 
the matrix elements Vy with Wij one by one and estimate the effect of this change on the resolvent 
by a resolvent expansion. The idea of applying Lindeberg's method in random matrices was recently 
used by Chatterjee [J5] for comparing the traces of the Green functions; the idea was also used by 
Tao and Vu [81] in the context of comparing individual eigenvalue distributions. 

The four moment matching condition (|3,42p with 5 = first appeared in [ST]. For comparison 
with Theorem 13.61 we state here the main result of [ST]. Let Ai < A2 < . . . < A at and A^ < X' 2 < 
. . . < A'^y denote the eigenvalues of H and H' , respectively. Then the joint distribution of any 
fc-tuple of eigenvalues on scale 1/N is very close to each other in the following sense: 

Theorem 3.8 (Four moment theorem for eigenvalues). \81\ Theorem 15] Let H and H' be two 

Wigner matrices. Assume that the first four moments of hij and h'^ exactly match and the subex- 
ponential decay (|1.17j) holds for the single entry distributions. Then for any sufficiently small 
positive e and e' and for any function F : M k — > M satisfying \V^F\ ^ N £ , j ^ 5, and for any 
selection of k-tuple of indices h,i2, ■ ■ ■ ,ik £ [ £ -^> (1 — £ )N] away from the edge, we have 



EFiNXi, , iVA i2 , . . . , NX ik ) - E'F ( NX' H , NX' h NX' lk 



SC N~ Co (3.45) 



with some cq > 0. The exact moment matching condition can be relaxed to (|3.42p . but cq will 
depend on 6. 

Note that the arguments in (|3.45p are magnified by a factor N so the result is sufficiently 
precise to detect individual eigenvalue correlations. Therefore Theorem 13.61 or 13.81 can prove bulk 
universality for a Wigner matrix H if another H' is found, with matching four moments, for which 
universality is already proved. This will be explained in Section 13.31 

Both Theorem 13.61 and Theorem 13.81 rely on some version of the local semicircle law on the 
shortest possible scale. There are, however, three main differences between them. 

(i) Theorem 13.61 compares the statistics of eigenvalues of two different ensembles near fixed en- 
ergies while Theorem 13.81 compares the statistics of the j\ , j'2 , • • • jfc-th eigenvalues for fixed 
labels ji,j 2 , ■■■3k- 

(ii) Both theorems are of perturbative nature that require some apriori information. Theorem 13.61 
uses a bound on the resolvent matrix entries, \Gij(z)\, that has already been obtained in the 
local semicircle law (see, e.g. (I2.19P ). Theorem 13.81 needs an apriori lower bound on the gaps 
to exclude possible eigenvalue resonances that may render the expansion unstable. This is 
achieved by a level repulsion estimate that is the most complicated technical part of [ST]. 
Previously, even more precise level repulsion estimates were obtained in [39] but only for 
smooth distributions. 
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(iii) Theorem 13.61 also compares off-diagonal Green function elements, an information that cannot 
be obtained from Theorem 13.81 Hence it directly provides information on the eigenvectors 
as well, see [58] for the development. In fact, once Theorem 13.61 is proved for all energies, it 
also implies Theorem 13.81 The reason is that we can integrate correlation functions in energy 
with a precision smaller than the typical size of the gap, hence eigenvalues with a fixed label 
can be identified. This was first done near the edge in [32] and later in the bulk in |58j . 

Sketch of the proof of Theorem \3.6l We fix a bijective ordering map on the index set of the 
independent matrix elements, 

N(N + 1) 



0:{(i,i):l<i<j<JV}->{l,..., 7 (JNO}, l(N):-- 



2 

and denote by i? 7 the Wigner matrix whose matrix elements hij follow the ^-distribution if j) ^ 
7 and they follow the ^-distribution otherwise; in particular H^ v ' = Hq and = H^^ N y 

Consider the telescopic sum of differences of expectations (we present only one resolvent for 
simplicity of the presentation): 

E ( — Tr —r^ | -E f — Tr — - -J ) (3.46) 

E — Tr — - E — Tr 



£ 

7=1 



N H y -zJ \N # 7 _i 



Let E^' denote the matrix whose matrix elements are zero everywhere except at the position, 

where it is 1, i.e., E^P = b~ik°~il- Fbc a 7 ^ 1 and let be determined by 4>(i,j) = 7. We will 
compare -ff 7 _i with H-y. Note that these two matrices differ only in the and matrix 

elements and they can be written as 

#7-1 = Q + -7=V, V:= vyEM + VjiEW , Vji := v tJ , 
V N 

H 1 = Q + -$=W, W := Wij E^ + Wji E^ , Wji := 1%, 
v N 

with a matrix Q that has zero matrix element at the and positions. 
By the resolvent expansion, 

5 7 _x = R- N-^ 2 RVR + ... + N- 2 (RV) 4 R- N~^ 2 (RV) 5 S, ^ := 77^— , <S 7 _i := — — — , 

\cg Z 11-y — ~]_ Z 

and a similar expression holds for the resolvent 5 7 of by H~. From the local semicircle law for 
individual matrix elements (|2.19p . the matrix elements of all Green functions R, iS~_i,jSv are 
bounded by CN 2e . Although (j2.19j) is not directly applicable to 77 ^ iV~ 1+e , it is easy to show that 

rl 

\Gij(E + ir])\ < max | Gu(E + irj) | < — max \Ga{E + irj )\ 
i rj i 

so choosing rf ~ N~ l+£ we can prove a bound for 77 slightly below 1/N at the expense of a factor 
77/7/. The estimates of the related resolvents R, 5 7 _i,5 7 are similar. 

By assumption (|3.42p . the difference between the expectation of matrix elements of Sw_i and 
5 7 is of order j\f~ 2 - s + Ce , Since the number of steps, 7 (iV) is of order iV 2 , the difference in (|3.46p 
is of order N 2 N~ 2 ~ 5+Ce <C 1, and this proves Theorem 13.61 for a single resolvent. It is very simple 
to turn this heuristic argument into a rigorous proof and to generalize it to the product of several 
resolvents. The real difficulty is the input that the resolvent entries can be bounded for a general 
class of Wigner matrices down to the almost optimal scale r\ ~ 1/N . 
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3.3 Universality for generalized Wigner matrices: putting it together 



In this short section we put the previous information together to prove Theorem 11.21 We first focus 
on the case when is independent of N . Recall that Theorem 13.11 states that the correlation 
functions of the Gaussian divisible ensemble, 

H t = e^Ho + (1 - e"') 1 / 2 U, (3.47) 

where Hq is the initial Wigner matrix and U is an independent standard GUE (or GOE) matrix, 
are given by the corresponding GUE (or GOE) for t ^ N~ 2a+£ provided that the a-priori estimate 
(|3.4I) holds for the solution ft of the forward equation (|3.2I) with some exponent a > 0. Since the 
rigidity of eigenvalues (|2.22p holds uniformly for all generalized Wigner matrices, we have proved 
(HH) for o = 1/2 - e with any e > 0. 

From the evolution of the OU process HS3D for Vij = NV 2 hij we have 

|Eu£(t) - Eu?-(0)| < Ct = CN~ 1+3£ (3.48) 

for s = 3,4 and with the choice of t = iV -1+3e . Furthermore, E/i*-(t) are independent of t for 
s = 1,2 due to Kvij(0) = and Kvfj(t) = 1. Hence (|3.42p is satisfied for the matrix elements of Ht 
and Hq and we can thus use Theorem 13.71 to conclude that the correlation functions of H t and Hq 
are identical at the scale 1/N. Since the correlation functions of Ht are given by the corresponding 
Gaussian case, we have proved Theorem 11.21 under the condition that the probability distribution 
of the matrix elements decay subexponentially. Finally, we need a technical cutoff argument to 
relax the decay condition to (jl. 13j) which we omit here (see Section 7 in [32]). 

The argument for iV-dependent b = in the range bjy ^ iV -1+ £, £ > 0, is slightly different. 
For such a small bjy, (|3.7p could be established only for relatively large times, t N~^/ 8 . We 
cannot therefore compare Hq with Ht directly, since the deviation of the third moments of fjj(0) 
and Vij(t) in (|3.48p would not satisfy (|3.42p . Instead, we construct an auxiliary Wigner matrix Hq 
such that up to the third moment its time evolution Ht under the OU flow (I3.47P matches exactly 
the original matrix Hq and the fourth moments are close even for t of order N~^ 8 (see Lemma 3.4 
of [46]). Theorem 13.11 will then be applied for Ht, and Theorem 13.61 can be used to compare Ht and 
Hq . This completes the sketch of the proof of Theorem 11.21 



4 Universality of the correlation functions for /^-ensembles 

In this section we outline the proof of Theorem 11.41 We will use the notation \x = fJ-^Iy for the 
probability measure defining the general /3-ensemble with a potential V on N ordered real points 
Ai ^ ... ^ Atv, see (ll.lOp . We let P M and E M denote the probability and the expectation with 
respect to \i. The equilibrium density is denoted by g = Qy and its Stieltjes transform by 

m(z) = my(z) = f ^ dx. 

Jr x- z 

The classical location of the k-th point will be denoted by = jkVi see (|1.22p . Note that in this 
section fj,, g, m and 7^ refer to the quantities related to the general V and not the Gaussian one 
as in the previous sections. This avoids carrying the V subscripts all the time as in Section [2] we 
dropped the subscripts sc referring to the semicircle law. 
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4.1 Rigidity estimates 

For simplicity of presentation we assume that the potential V is convex, i.e., 

w := - inf V"(x) > 0, (4.1) 

the equilibrium density g(s) is supported on a single interval [A, B] C M. and satisfies (|1.20p (for 
the general case, see [T2])- The Gaussian case corresponds to V(x) = x 2 /2, in which case the 
equilibrium density is the semicircle law, g sc , given by (|1.3|) . Our main result concerning the 
universality is Theorem 11.41 and similar statement holds for the universality of the averaged gap 
distributions directly. In fact, the proof of Theorem 11.41 goes via the averaged gap distribution as 
we now explain. 

The first step to prove Theorem ll.4l is the following theorem which provides a rigidity estimate 
on the location of each individual point. The precision in the bulk is almost down to the optimal 
scale 1/N, the estimate is weaker near the edges. In the following, we will denote [x, y\ = Nn [x, y\. 

Theorem 4.1. Theorem 3.1 and Lemma 3.6] Fix any a, e > and assume that (|4. 11) holds. 

Then there are constants 5, c\, oi > such that for any N ^ 1 and k £ laN, (1 — a)N}, 

P M (|A fc - 7 fc| > ^ _1+e ) < c ie - c * NS . (4.2) 
The following weaker bound is valid close to the spectral edges: 

F M (|A fc - 7fc| ^ iV- 4 / 15+e ) < Cl e- C2N& . (4.3) 
for any k G [iV 3 / 5+e ,iV - iV 3 / 5+e ]]. Finally, the bound 

^{\\ k - lk \^e)^ Cl e^ NS . (4.4) 

holds for any k € [l,iV]|. 



We explain some ideas of the proof of (|4.2p , the arguments for (|4.3|) are similar and (|4.4|) follows 
from an easy large deviation bound, see [11]. The first ingredient to prove (|4.2p is an analysis of 
the loop equation following Johansson [57] and Shcherbina [78]. The equilibrium density g, for a 
convex potential V, is given by 



g(t) = -r(tW(t - A)(B - t)l [AB] (t), (4.5) 

7T 

where r is a real function that can be extended to an analytic function in C and r has no zero 
in R. Denote by s(z) := —2r(z)\J {A — z){B — z) where the square root is defined such that its 
asymptotic value is z as z — > oo. Recall that the density is the one-point correlation function which 
is characterized by 

/ dA 1 0(Ai)pJJ ) (Ai)= / 0(Ai)d/4^(A), A = (A 1 ,A 2 ,...,A JV ). (4.6) 

JM JR N 

Let mjv and m be the Stieltjes transforms of the density p$ and the equilibrium density g, respec- 
tively. Notice that in Section [2] we have used thn to denote the Stieltjes transform of the empirical 
measure (|2.1ip ; here ffiN denotes the ensemble average of the analogous quantity. 
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Define the analytic functions 

JR Z — t 

and 

cn(z) ■■= ^k N (z) + ^ ^ - 1J m' N (z), with k N (z) := Var M I ^ 



z — Ai 
fc=i * 



Here for complex random variables X we use the definition that Var(X) = K(X 2 ) — E(X) 2 . 

The equation used by Johansson (which can be obtained by a change of variables in (|4.6p [57j or 
by integration by parts [78]), is a variation of the loop equation (see, e.g., [IE]) used in the physics 
literature and it takes the form 

(rhN — m) 2 + s(ffiN — m) + 6tv = cat. (4.7) 

Equation (|4.7I) can be used to express the difference fh^ — m in terms of (mjv — m) , b^ and 
cat. In the regime where |m/v — m\ is small, we can neglect the quadratic term. The term is 
of the same order as |mjv — m \ and is difficult to treat. As observed in [HEH], for analytic V, this 
term vanishes when we perform a contour integration. So we have roughly the relation 




{m N - m) ~ — 2 Var M > , (4i 



where we dropped the less important error involving m! N {z)/N due to the extra 1/N factor. In 
the convex setting, the variance can be estimated by the logarithmic Sobolev inequality and we 
immediately obtain an estimate on ffiN — m. We then use the Helffer-Sjostrand formula, see (|2.24|) . 
to estimate the locations of the particles. This will provide us with an accuracy of order iV -1 / 2 
for E^Afc — 7^. This argument gives only an estimate on the expectation of the locations of the 
particles since we only have information on the averaged quantity, fh^. Although it is tempting 
to use this new accuracy information on the particles to estimate the variance again in (|4.8p . the 
information on the expectation on Aj, alone is very difficult to use in a bootstrap argument. To 
estimate the variance of a non-trivial function of A& we need high probability estimates on A&. 

The key idea in this section is the observation that the accuracy information on the A's can be 
used to improve the local convexity of the measure [i in the direction involving the differences of 
A's. To explain this idea, we compute the Hessian of the Hamiltonian of ji: 



v,V 2 ^(A)v)^ ro ||v|| 2 + l^ 



(Vi 



\2 



The naive lower bound on V 2 % is w, but for a typical A = (Ai, A2, • • • , A^r) it is in fact much better 
in most directions. To see this effect, suppose we know |Aj — Aj| < M/N with some M for any 
i, j € I^ 1 , where ljf := [fc — M, k + M]. Then for v = (vf.-M, ■ ■ ■ , ^fc+Af) with ^ ■ Vj = we have 

(v,V 2 ^(A)v)^ £ (*-«;) 2 ^£# ( 4 -9) 

This improves the convexity of the Hessian to N/M on the hyperplane £^ • Vj = 0. Let 

[M] ._ _J_ \ - x _ _J_ \ - > 
A fc - I/Mi 2_> A J " 2M + 1 ^ J 
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denote the block average of the locations of particles and rewrite 

^-Ar^EK"' 1 -^' 1 ) 

3 

as a telescopic sum with an appropriate sequence of M\ = 0, M2,... with the property that 
Mj/Mj-i ^ N e . We can now use the improved concentration on the hyperplane ^ ■ Vj = to the 

variables aP^ — to control the fluctuation of A& — Aj^ K Since the fluctuation of A-l^ ' 

is very small for small e, we finally arrive at the estimate 

(|A fe - E"(X k )\ > a) < Ce~ CN2a2/M . (4.10) 

From (|4.10p we thus have that |A& — E^Afc| < \/M /iV with high probability. This improves the 
starting accuracy |Aj — Xj\ < M/N for i, j G Ijf to \\% — Xj\ < M'/N with some M' <C M, provided 
that we can prove that |E^(Aj — Xj)\ <C M'/N. But the last inequality involves only expectations 
and it will follow from the analysis of the loop equation (|4.7p we just mentioned above. Starting 
from M = N, this procedure can be repeated by decreasing M step by step until we get the optimal 
accuracy, M ~ 0(1)- The implementation of this argument in [11] is somewhat different from this 
sketch due to various technical issues, but it follows the same basic idea. 



4.2 The local equilibrium measure 

Having completed the first step, the rigidity estimate, we now focus on the second step, i.e. on the 
uniqueness of the local Gibbs measure. Let < n < 1/2. Choose q 6 [re, 1 — re] and set L = [Nq] 
(the integer part). Fix an integer K = N k with k < 1. We will study the local spacing statistics of 
K consecutive particles 

{A,- : jel}, I = I L :=\L + 1,L + Kl 
These particles are typically located near E q determined by the relation 



rE q 

/ g(t)dt = q. 
J —00 



Note that \j L - E q \ ^ C/N. 

We will distinguish the inside and outside particles by renaming them as 



(Ai,A 2 ,...,Aat) := (yi, ■ ■ .VL,xl+i, ■ ■ ■ ,xl+k,VL+k+i, ■ ■ - vn) e H (Ar) , (4.11) 

but note that they keep their original indices. The notation refers to the simplex {z : z\ < 
Z2 < ■ ■ ■ < zn} in K . In short we will write 

x= (x L+1 ,...,x L+K ), and y = {y u . . . , y L , Vl+K+1, ■ ■ ■ , Vn), 

all in increasing order, i.e. x G ~,( K ) and y G z,( N ~ K \ We will refer to the y's as external points 
and to the rr's as internal points. 

We will fix the external points (also called as boundary conditions) and study conditional 
measures on the internal points. We define the local equilibrium measure on x with fixed boundary 
condition y by 



My(dx) = Mx)dx, pty(x) := /x(y,x) 



/x(y,x)dx 



(4.12) 
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Note that for any fixed y G ^( N — K )^ the measure p y is supported on configurations of K points 
x = {xj}j£i located in the interval [yLiVL+K+i]- 

The Hamiltonian H y of the measure /i y (dx) ~ exp(— /3NT-L y (x.))dx is given by 

^y( x ) : = Yl 2 Vy( - Xi ^~ N ^ l °s\xj-Xi\ with V y (x) := V{x) - — ^ lo S \*-VjY ( 4 - 13 ) 

We now define the set of good boundary configurations with a parameter 5 = 5(N) > 

Gs = G ■= {y G S^-^ : - 7i | < 5, Vj G [iVK/2,L] U [L + if + l,N(l - k/2)]}, (4.14) 

where k is a small constant to cutoff points near the spectral edges. Some rather weak additional 
conditions for y near the spectral edges will also be needed. They can be built in the definition of 
G based upon the bounds (|4.3p and (|4.4p but we will neglect this issue here. 

Let a and p be two measures of the form (|l.l(jp with potentials W and V and densities g = gw 
and gv, respectively. For our purpose W(x) = x 2 /2, i.e., a is the Gaussian /3-ensemble and its 
density gw(t) = ^f(4 — t 2 ) 1 ^ 2 is the Wigner semicircle law. Let the sequence jj be the classical 
locations for fi and the sequence Oj be the classical locations for a. Similarly to the construction 
of the measure p y , for any positive integer L' G ll,N — KJ we can construct the measure ctq 
conditioned that the particles outside are given by the classical locations 9j for j ^ \L\L' + K\. 
More precisely, we define a reference local Gaussian measure ag ~ exp(— /3iV^e(x))dx on the set 
[Ql'i Gl'+k+i] via the Hamiltonian 

i<j 

where I' := \L' + 1, L' + i^J. Since L' will not play an active role, we will abuse the notation and 
set V = L. 

The measure p y lives on the interval [yL, yi+K+i] while the measure ag lives on the interval 
[8l, 6l + x+i] and it is difficult to compare them. But after an appropriate translation and dilation, 
they will live on the same interval and from now on we assume that [vliVl+k+i] = [Ql,Ql+k+i]- 
The parameter K = N k has to be sufficiently small since gv and gw are not constant functions and 
we have to match these two densities quite precisely in the whole interval. There are some other 
subtle issues related to the rescaling, but we will neglect them here to concentrate on the main 
ideas. Our main result is the following theorem which is essentially a combination of Proposition 
4.2 and Theorem 4.4 from jllj . 

Theorem 4.2. Let < ip < 4- Fix K = N k , 5 = N~ 1+v and k = ^-p. Then for y G Q s we have 



(4.16) 



as N — >■ oo for any smooth and compactly supported test function O. A similar formula holds for 
more complicated observables of the form (13. 9p . 



From the rigidity estimate, Theorem 14.11 it follows that Gs has an overwhelming probability, 
so the expectation E^ y can be changed to and similarly for the reference measure. Once (|4.16p 
is proven for all observables of the form (13. 9p . we get that the locally averaged gap statistics for p 
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coincide with those of the reference Gaussian case, hence they are universal. Since averaged gap 
statistics identifies locally averaged correlation functions, we obtain Theorem 11.41 

It remains to prove Theorem 14.21 The basic idea is to use the Dirichlet form inequality ()3.33[) . 
Although (|3.33p was stated for an infinite volume measure, it holds for any measure with repulsive 
logarithmic interactions in a finite volume and with the parameter r _1 being the lower bound on 
the Hessian of the Hamiltonian. In our setting, we denote by r" 1 the lower bound for \7 2 Hq, and 
the Dirichlet form inequality becomes 



iei 



/T N 6 \ 1/2 .„ / 

£ C(^ ir D(^\ae) ) + Ce~ cN ^S(p y \a e ), (4.17) 



where 

1 

2iV 

Thus our task is to prove that 



v vSf d(7e - (418) 



£ D(p y | a e ) 



r a N s ^ y ' aj -)• 0. (4.19) 
is. 



By definition, 



L+ls^j^L+K 



where Zj is defined as 



Zj := V(-) - 4 Y — ?-W'(x j ) + £- T y (4.20) 



k>L+K k>L+K 



Using the equilibrium relation (jl.20p between the potentials V, W and the densities gy, Qw, we 
have 

6v{y) j P 1 a f Qw(y) , , P 1 



z p ymidy-JL y ^—-p "-^>d y + ^ T y 

J R Xj-y N f-j xj -y k J^Xj-y N f-f Xj 



k<L •> -J nk J a k<L 

k>L+K k>L+K 



Hence Zj is the sum of the error terms, 



/ £ -J_, (4.21) 



lyZ[vL,yL+K+i] x j-y ^ t£ x 'J~ Vk 

k>L+K 

Bj := [ VL+K+1 t»M-w(v) dyt (422) 
J VL Xj y 

and there is a term similar to Aj with yj replaced by 9j and gy replaced by qw 

With our convention, the total numbers of particles in the interval [vl+k+IiVl] are equal and 
thus 

ryL+K+i rVL+K+i 

Qv(y)dy = / gw(y)dy. 

'Vl JyL 

Since the densities py and pw are C 1 functions away from the endpoints A and B and yi + K+i — yh 
is small, \py — pyy\ is small in the interval [vl+k+IiVl] and thus Bj is small. For estimating Aj, 
we can replace the integral 



f 

Jyr 



rSYML iy by Iv_L 

J-oo xj - y N t^L Xj ~ lk 
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with negligible errors, at least for j's away from the edges, j E \L + N £ , L + K — N £ ~\. Thus 



mm^-SI y t ^ : = — — > ( 4 - 23 ) 



and 7^- can be estimated by the assumption \yk — 7k\ ^ $ from y G Gs- The same argument works 
if j is close to the edge, but k is away from the edges, i.e. k ^ L — N £ or k ^ L + K + -/V e . The 
edge terms, for | j — k\ ^ iV e , are difficult to estimate due to the singularity in the denominator 
and the event that many y^s with k < L may pile up near y^. To resolve this difficulty, we show 
that the averaged local statistics of the measure fi y are insensitive to the change of the boundary 
conditions for y near the edges. This can be achieved by the simple inequality 



\k^ 

iei 



J 0{N( Xi - x i+1 ))[d/v " d%]| < C j |d/v " d^ y | < Cy/SbjylfJLy) (4.24) 



for any two boundary conditions y and y'. Although we still have to estimate the entropy that 
includes a logarithmic singularity, this can be done much more easily since entropy is less sensitive 
to singularities than Dirichlet form. Therefore, we can replace the boundary condition yk with 
y'k = &k for \j ~ k\ ^ N £ and then the most singular edge terms in (|4.20p cancel out. 

We note that we can perform this replacement only for a small number of index pairs (j,k), 
since estimating the gap distribution by the total entropy, as noted in f|3.36j) in Section 13.11 is not 
as efficient as the estimate using the Dirichlet form per particle. Thus we can afford to use this 
argument only for the edge terms, \j — k\ ^ N £ . For all other index pairs (j, k) we still have to 
estimate Tj by exploiting that y is a good configuration, i.e. yk — 7& is small. 

Unfortunately, even with the optimal accuracy 5 ~ N~ 1+£ ' in (I4.14p as an input, the relation 
(|4.19p still cannot be satisfied for any choice of N C£ ^ K ^ iV 1 "^ . To understand this problem, 
we remark that while the edge terms become a smaller percentage of the total terms in (14.240 as K 
gets bigger, the relaxation time to equilibrium for ag, determined by the convexity of Hq, increases 
at the same time. At the end of our calculation, there is no good regime for the choice of K. 
Fortunately, this can be resolved by using the idea of the local relaxation measure as in ()3.22|) . i.e., 
we add a quadratic term ^p(xj — jj) 2 to the Hamiltonian of the measure \i y and -^-(xj — Oj) 2 for 
the measure oq. With these ideas, we can complete the proof of Theorem 14.21 



5 Single gap universality 

In this section we outline the proofs of Theorem 11.31 and 11.51 following closely |44] . Both proofs 
rely on the single gap universality for the locally conditioned measure fi y introduced already in 
(|4.12p . This will be stated in Theorem 15. 1[ whose proof takes up most of this section. At the end, 
in Section 15.41 we complete the proofs of Theorem 11.31 and 11.51 

5.1 Statements on local equilibrium measures 
5.1.1 Definition 

We work in the bulk spectrum and we consider the local equilibrium measure on tC := IK + 1 
points which is the conditional measure after fixing all other points. To define it precisely, we fix 
two small positive numbers, a, 5 > and choose two positive integer parameters L, K such that 

L G {aN, (1 — a)iV], N s < K < iV 1/4 . (5.1) 
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All results will hold for any sufficiently small a, 5 and for any sufficiently large iV ^ Nq, where the 
threshold Nq depends on a, 5 and maybe on other parameters of the model. 

Denote I = Ilk := \L~ -^+-^1 the set of /C consecutive indices in the bulk. As in Section |4"T2"1 
we will distinguish external and internal points by renaming them as 

(Ai,A 2 , • • ■ ,Aat) := (yi, • ..y L -K-i,XL-K, ■ ■ ■ ,x L+K ,y L+K+1 , ...yjv) G Z (N \ (5.2) 

the only difference is that here the internal particles are labelled symmetrically to L. This discrep- 
ancy is only notational, but we prefer to follow the notations of the original papers. In short we 
will write 

x = {x L - K , . . . x L+K ) G ~ (/c) , and y = (y 1 , . . . y L ^ K ^i,y L+K+u . . . y N ) G H (7V_/C) . 
As in ()4.12p we again define the local equilibrium measure on x with boundary condition y by 



My(dx) := Mx)dx, M x ) := x ) 



My,x)dx 



(5.3) 



where fi = /i(y,x) is the (global) equilibrium measure (|1.10|) . For a fixed y, this measure can also 
be written as a Gibbs measure, 

My = /W = Z y 1 e~ N ^, (5.4) 

with Hamiltonian 

^y( x ) := Yl 2 Vy ^ ~ N ^ log ' Xi v y( x ) :=v ( x )-J^J2 lo &\ x -yk\- ( 5 - 5 ) 

i£l k^I 

Here V y (x) can be viewed as the external potential of a j3- log-gas of the points {xi : i G /} in the 
configuration interval J = J y := (yL~K-i,yL+K+i)- 

5.1.2 Universality of the local gap statistics for fi y 

Our main technical result, Theorem 15.11 below, asserts that the local gap statistics is essentially 
independent of V and y as long as the boundary conditions y are regular. This property is expressed 
by defining the following set of "good" boundary conditions with some given positive parameters 
£, a (thet set Q in (|4.14p played exactly the same role) 

U = K L , K (Z, «) :={y : \y k ~ 7*1 < N' 1 ^, k G {aN, (1 - a)Nj \ I L , K } (5.6) 

n {y : \y k - 7fe| < iV" 4/15 i^, k G {N^K^N - N^K^j} 

n{y : \y k -ik\ < 1, k G fl, iVj \ Il,k}- 

This definition is taylored to the rigidity bounds for the /3-ensemble, see Theorem 14.11 Note that 
1Z has a key parameter, the exponent £, which will be chosen as an arbitrary small positive number 
in the applications. We will not follow its dependence precisely and we will often neglect it from 
the notation, i.e. we will talk about "good" boundary conditions y G 7Z. 

Good boundary conditions give rise to a regular potential V y . More precisely, if y G 7Z, then 

I^aW + °(^)' (5 ' 7) 

W = ^) 1o S^ + o(t^t), xGJ y , (5.8) 



d-(x) \Nd{x) 
c 

d(x) 



V y '(x) > inf V" + -f-, x G J y . (5.9) 
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Here 



-Xv 



L-K-l + UL+K+l) 



denotes the midpoint of the configuration interval, d(x) := min{|x — yL-K-i\, \% — Vl+k+i\} is the 
distance of x to the boundary of the configuration interval J = {ul-K-i-i Vl+k+i) and d-(x) and 
d+(x) are regularized versions of the distances of x to the closest and to the farthest endpoints of J, 
respectively. The key point is that the leading term of V y (x) depends on the boundary conditions 
only through the density in the center, g(y). We also introduce 



a ; 



y + 



j-L 



\J\, 



J - a • K + 1 
to denote the K, equidistant points within the interval J. 



3 6 Il,k, 



(5.10) 



Theorem 5.1 (Gap universality for local measures). Fix L, L and K, = 2K + 1 satisfying (|5.1|) with 
an exponent 5 > 0. Consider two boundary conditions y,y such that the configuration intervals 
coincide, 

J = {VL-K-hVL+K+l) = (yi-K-Vyi+K+l)- t 5 - 11 ) 
We consider the measures ji = /J-y^y an d Ji = ^ y defined as in (|5.4|) . with possibly two different 
external potentials V and V . Let £ > be a small constant. Assume that \J\ satisfies 



\A 



Ng(y) + \N 



Suppose that y, y € 1Z and that 

max W y x,-, 



+ max 



< CN~ l K^ 



(5.12) 



(5.13) 



holds. Let the integer number p satisfy \p\ ^ K — K 1 ^^ for some small > 0. Then there 
exists £o > 0, depending on 5, such that i/^,^* ^ £o then for any n fixed and any bounded smooth 
observable O : W 1 — > R with compact support we have 



W*0{N{x L+p - x L+p+x ), . . .N(x L+p - x L+p+n )) 



(5.14) 



W*0(N{xi 



L+p 



N(x? 



L+p 



X L+p+ 



J) 



< CK' 



for some e > depending on 5, a and for some C depending on O. This holds for any N ^ Nq 
sufficiently large, where Nq depends on the parameters £,£*,a, and C in (|5.13p . 



5.1.3 Rigidity and level repulsion of fj, y 

In the following two theorems we establish rigidity and level repulsion estimates for the local log-gas 
fiy with good boundary conditions y. While both rigidity and level repulsion are basic questions 
for log gases and are interesting in themselves, our main motivation to prove these theorems is to 
use them in the proof of Theorem 15.11 

We remark that an almost optimal rigidity estimate in the bulk was given in Theorem 14.11 and 
some level repulsion bound was given in (4.11) of jllj . these results hold with respect to the global 
measure [i. For the proof of Theorem 15.11 we need their local versions with respect to /i y , at least 
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for most y. Naively, this looks as a simple conditioning argument, but there is a subtle point. From 
the estimates w.r.t. /x, one can conclude that [i y has a good rigidity bound for a set of boundary 
conditions with high probability w.r.t. the global measure [i. This will be sufficient for the proof of 
Theorem 11.51 but not for Theorem 11.31 In the proof for the gap universality of Wigner matrices we 
will need a rigidity estimate for fi y for a set of y's with high probability with respect to by the time 
evolved measure ftfi which may be asymptotically singular to /i for large N. The following result 
asserts that a rigidity estimate holds for ji y provided that y itself satisfies a rigidity bound and an 
extra condition, (|5. 15|) . holds. This condition will have to be verified with different methods in the 
Wigner case. 

Theorem 5.2 (Rigidity estimate for local measures). For y € 1Z consider the local equilibrium 
measure fi y defined in (j5.4j) and assume that 



W*x k - a k 



< CN~ l K^, kel = I LK , (5.15) 



is satisfied. Then there are positive constants C, c, depending on £, such that for any k £ I and 
u > 0, 

¥^(N\x k - a k \ > uRt) ^ Ce~ cu2 . (5.16) 



The proof of this result is similar to that of the concentration estimate (j4. lOj) in Theorem 14.11 
To estimate x k — E,^ y x k , we again use a multiscale argument of local averages for which stronger 
convexity bounds are available. The analogue of the accuracy estimate controlling & y x k — "f k in 
Theorem 14.11 is replaced by the assumption (|5.15p . Notice that, unlike for the global measure /z, a 
direct accuracy control via the loop equation is not available for [i y since the potential V y is not 
analytic. 

Now we state the level repulsion estimates. 

Theorem 5.3 (Level repulsion estimate for local measures). For y € 1Z we have the following 
estimates: 

i) [Weak form of level repulsion] For any s > we have 

¥^[N(x i+l - Xi ) ^s}^C(Nsf +1 , i e {L- K- 1,L + Kj. (5.17) 

ii) [Strong form of level repulsion] Suppose that there exist positive constants C,c such that the 
following rigidity estimate holds for any k G /: 

N\x k -a k \ > CK^ 2 ) < Cexp(-K c ). (5.18) 



Then there exists small a constant 9, depending on C,c in (|5.18|) . such that for any s ^ exp(— K e ). 
we have 

F^[N(x i+1 -xi) < s] < C (K^slogN) +1 , i e \L - K - 1, L + Kj. (5.19) 



The level repulsion bounds will mostly be used in the following estimate which trivially follows 
from Theorem! 



Corollary 5.4. Let y G 1Z, then for any p < (3 + 1 we have 



^ y [AT , 1 -^^CpK ^, ie{L-K-l,L + Kl (5.20) 



□ 
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5.1.4 Sketch of the proof of the level repulsion 

The proof of part (ii) of Theorem 15.31 goes in three steps. For simplicity, we consider (|5.19|) only 
for the first gap, i.e. i = L — K — 1 and we also assume that y = by a simple shift. 

Step 1. In this step we prove 

(x L _ K - vl-k-i «S s/N) < CKs log N, (5.21) 

which is essentially (|5.19p but with factor K instead of and with the exponent /3 + 1 replaced 
with one. The proof of (|5.2ip is dilation argument. For a nonnegative parameter <p, we define 

Z v :=[... dx J] ( Xi - x 3 fe~ N ^ ^ 

J J-a+aip iJeI 

= ( l_ (p) K+pK(K-l)/2 f [* dw Y[(w i -W j )f , e- N te j V y «l- V )v, i ) t (5 22) 

J J~ a i<j 

where we set 

a := -y L _ K _ x , wj := (1 - ip^XL+j, dx = [ dx L+j dw = [ dwj. 

\j\<K \j\^K 

Clearly Z<n=o is the normalization constant of the measure /x y and we have 

^{xL-K-{-a)>atp)>-£-. (5.23) 

The multiple integral on the r.h.s of (|5.22[) is almost the same as Zq, except that the argument of 
V y is rescaled by 1 — (p. This effect can be estimated from the explicit formula (|5,5p for V y . The 
external potential V in (|5.5p is unproblematic since it is smooth. Due to y G TZ, the points yj are 
regularly spaced on scales at least K^/N, thus the sum of the interaction terms log \ x — y^\ for k's 
away from the edges of I c , i.e. k ^ L — 2K or k L + 2K, is a regular function of x and the effect 
of dilation can be well approximated by Taylor expansion. For nearby fe's right below the lower 
edge, i.e. L — 2K ^ k ^ L — K, we use the trivial bound (1 — ip)x — ^ (1 — <p)(x — ?/&). From 
these estimates it follows that 

> l-Ci^VlogiV. (5.24) 

Since a ~ ET/iV, together with (pT2"3j) it implies (^2T) . 

Step 2. Now we consider an auxiliary measure which are slightly modified version of the local 
equilibrium measures: 

:= Z^(x L _ K - y M _!)-V y ; (5.25) 

where Z^ are chosen for normalization. In other words, we drop the term (xl~k — Vl-K-i)^ from 
the measure fi y . Setting X := xl-k — Vl-K-i for brevity, we have 



W (0) [l{X < s/N)XP] 
W m [XP] 



P** [X < = L :, (0) 7./ fl1 7 • (5-26) 



The estimate (I5.21|) also holds for //°) and thus 

E^ 0) [1{X ^ s/N)X p ] < C(s/NfKs\ogN 
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and with the choice s = cK 1 (log N) 1 in (|5.2ip we also have 

PM (0) ( x ^ ] ^ 1/2 

V NKlogNj ^ ' 

with some positive constant c. This implies that 

[ J 2\NK\ogNJ 
Combining with (|5,26p . we have thus proved that 

P^y [X <; S /N] < C (Ks log N) p+1 , (5.27) 

i.e. we obtained (|5.2ip but with an exponent (3 + 1 in the r.h.s. 

Step 3. We now improve the constant K to in the r.h.s of (|5.21|) . The factor K originated 
from the number of particles in \x y . We can further condition the measure fi y on the points 

Zj :=Xj j ^ L- K + 

and we let //y iZ denote the conditional measure on the remaining x variables {xj : L — K ^ j ^ 
L — K + K^}. From the rigidity estimate (j5. 18[) we have (y, z) G 1Z with a very high probability 
w.r.t. fiy. We will now apply (|5.27p to the measure /x y>z to obtain 

P^'*[X ^ s/iV] ^ C (K^slogN) . (5.28) 

This holds for all z with a high /Zy-probability. The subexponential lower bound on s, assumed in 
part ii) of Theorem l5.3l allows us to include the probability of the complement of 7Z in the estimate, 
we thus have proved (|5.19p . 

The proofs of the weaker bound (15.170 for any s > use similar arguments that have led to 
(|5.2ip , but without assuming y G 71 which yields that one factor of K has to be replaced with N in 
(|5.24p . The assumption that the boundary conditions are good needs to be dropped since in Step 3 
of the above argument, (|5.2ip is also used after additional conditioning on z, distributed according 
to n y , and without (I5.18P there is no rigidity result available for /x y . 



5.2 Proof of Theorem 15.11 

In this section, we start to compare gap distributions of two local log-gases on the same configuration 
interval but with different external potential and boundary conditions. For simplicity, we consider 
only an observable of a single gap; a few consecutive gaps can be handled similarly. From now on, 
we use microscopic coordinates, i.e. we replace Xj with Xj/N, and we also relabel the indices so 
that the coordinates of Xj are j £ I = {— K, . . . , 0, 1, . . . K}. This will have the advantage that K 
remains the only large parameter; N disappears. 

The local equilibrium measures and their Hamiltonians will be denoted by the same symbols, 
\iy and Hy, as before, but with a slight abuse of notations we redefine them now to the microscopic 
scaling, i.e. 

«y(x) :=Y,\v y (xi) - ^logl^-Xil, Vy(x) := NV(x/N) - 2^og\x - Vj \, (5.29) 
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The other Hamiltonian H y is defined in a similar way with V in (|5.29p replaced with another 
external potential V. We also rewrite f)5. 13j) in the microscopic coordinate as 

\W*Xj -aj\ + \W^Xj - atj\ < CK*, (5.30) 

where ctj := j^+jl^l is the rescaled version of the definition given in (|5,10p . but we keep the same 
notation. The concept of "good" set 1Z is also rescaled accordingly. 
Suppose that y, y G 1Z and define the interpolating measures 



w y,y 



2 re -0rO£(x)-v y (x)) rG [0,1], (5.31) 



so that = jtly and ~ = fi y (Z r is a normalization constant). This is again a local log-gas 
with Hamiltonian 

^y,y( x ) = ^E^y^) ~ ^ Z>6 l x * ~ *il> ty,yC«0 := {l-r)V y (x) +rV~(x). (5.32) 

For any fixed r, the measure u; y ~ inherits all relevant properties of fi y . In particular the rigidity 
bound in the form 

P w (|xi - Oi| ^ CFC C? ) < Ce- A ' S , i G J, (5.33) 

the level repulsion bounds (|5.17p - (|5.19p and their consequence in (I5.20p hold w.r.t. the measure 
uj = ujy ~ as well (in the new microscopic coordinates there are no N factors in the left hand sides of 
these inequalities). The proofs are basically parallel with the arguments for /x y ; the only nontrivial 
step is to show that (|5.30p implies the analogous bound 

\E w x k -a k \ < CK^ 

w.r.t. uj = Wy~ as well. Although uj appears to be some easy combination of fi y and fly, this 
conclusion is nontrivial. It requires comparing uj and fi y via the entropy inequality, which involves 
controlling the exponential moment of \x k — otk\ w.r.t. fi y . At this point the Gaussian tail proven 
in (|5.16p is necessary. 

The right hand side of (|5.14p with n = 1, in the rescaled coordinates and with L = L = 0, is 
estimated by 

[EMy -Wy]0{x p - Xp+ i) < [ dr— E u m"0(^ - x p+ x). (5.34) 

Jo dr 

For any bounded smooth function O with compact support 

-^-W?y$0(x p - x p+ x) = I3(h ; 0(x p - x p+ i)) UJ ^^ l (5.35) 

where 

h = /iq(x) = J2{V y { Xi ) - V~( Xi )) (5.36) 
iei 

and (f;g)u '■= E^/g — (E^ f)(E ul g) denotes the covariance. Thus Theorem 15.11 follows immediately 
from the following estimate on the gap covariance function. □ 

Theorem 5.5. Consider two smooth potentials V, V and two good boundary conditions, y,y £ 
such that the configuration intervals coincide, J y = J y . For any r G [0,1] let ui = uj^~ be the 
interpolating measure defined in (|5.31|) . Assume that (|5.30|) holds for both boundary conditions 
y, y . Fix £* > 0. Then there exist e > and C > 0, depending on £*, such that for any sufficiently 
small £, for \p\ ^ K l ~^* we have 

\(h (x.y,O(x p - x p+1 )U ^ K c tK~ £ (5.37) 

for any smooth function O : K — > R with compact support provided that K is large enough. 
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Theorem 15.51 is our key technical result. The main difficulty behind it is due to the fact that 
the covariance function of two points, (xi',Xj) w , decays only logarithmically. In fact, for the GUE, 
Gustavsson proved that (Theorem 1.3 in |54j ) 

N 

{Xi;Xj) G UE ~ log jr-. q— — r, (5.38) 

and a similar formula is expected for u. Although ho(x) depends strongly only on points near the 
boundary and x p is away from the boundary, it is still very difficult to prove Theorem 15.51 based 
on this slow logarithmic decay. However, the covariance function of the type 

{gi(xi); g 2 (xj - Xj+i)) u (5.39) 

decays much faster in \i— j\. Since the second factor g 2 {xj —Xj+\) depends only on the difference of 
two neighboring points, it is expected that the decay is the (discrete) derivative in j of the covariance 
(|5.38p . i.e. it is \i — j\ ■ The actual result (|5.37p is much weaker, but it still provides a power-law 
decay in K instead of a logarithmic decay. Covariances of the form (gi(xi — Xi + i); g 2 {xj — Xj+i)) u 
are expected to decay even faster but we have not pursued this direction further. 

We point out that the fact that observables of differences of particles behave much nicer was a 
basic observation in DBM analysis (Theorem I3.4p . see the explanation around (|3.36j) . 



5.3 Decay of correlation functions: Proof of Theorem 15.51 

We will express the difference of gap distributions between two measures in terms of random walks 
in time dependent random environments. The decay of correlation functions will be translated into 
a partial regularity property of the corresponding parabolic equation. This partial regularity is a 
discrete version of the De Giorgi-Nash-Moser theory but with a long range elliptic part. 



5.3.1 Random Walk Representation 

In this section we derive a random walk representation for the gap correlation function on the left 
hand side of (|5.37p . We will apply it for the interpolating measure u; = u/ ~ (|5.3ip and for the 
function ho given in (15.360 . but the representation formula (Proposition 15.61 below) is valid for any 
oj and ho. 

Let Jz?^ be the reversible generator given by the Dirichlet form 



£>"(/) 



E 

\j\*ZK 



(djf) 2 du;. 



This process can also be characterized by the following SDE 



dxi = &Bi + j3 



2^ (xi 

3+1 



(5.40) 



dt, 



(5.41) 



where {Bi : \i\ ^ K} is a family of independent standard real Brownian motions. Let E x denote 
the expectation for this process with initial point x(0) = x. The expectation with respect to the 
process starting from equilibrium is E w [-] = J E x [-]w(dx). With a slight abuse of notations, when 
we talk about the process, we will use P w and E w also to denote the probability and expectation 
w.r.t. this dynamics with initial data distributed w.r.t. oj, i.e., in equilibrium. 
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Suppose h(t) = h(t,x) is the solution of the equation dth = J£^h with an initial condition ho- 
Introduce the notation 

v(i,x) = V x /i(t,x), i.e. Vj(t,x) :=d Xj h(t,x). (5.42) 

By integrating the time derivative of (h(t, x); 0(x p — Xp+i))^ and using the equation h(t, x) satisfies, 
we have 

(h (x);O(x p - x p+ i))u = J da J 0'(x p - x p+1 )[v p (a, x) - v p+1 (a, x)]du/(x). (5.43) 

For any fixed a, the inner integral on the right hand side can be expressed by a random walk 
representation. Fix a path {x(s) : s £ [0,cx]}. Define the following operators on R 

A(s) :=B(s)+W{s) (5.44) 
[B{s)v]j =-J2 B M*(° ~ *))(«* " 5 ifc (x) = J 1 ^ (5.45) 

fe l^j Xk) 

[W(*)v] 3 - = := [t£y]"M<7 - «)). 

Clearly is diffusion operator with random rates and W(s) is a potential representing a random 
environment. These operators depend on the whole path x(s), but we omit this fact from the 
notation. 

With these notations we have the following representation: 
Proposition 5.6. For any smooth function ho : — > R, for any p £ /, —K ^ p ^ K — 1, we have 

(h ;O(x p - x p+1 )) UJ = J da J 0'(x p - x p+ i)E^[w p (a,x(-);a) - w p+1 (a,x(-);a)]uj(dx). (5.46) 

Here, for any a > and for any fixed path {x(s) : s G [0, a]} we Zei w denote the solution of the 
evolution equation 

d s w(s; x(-), a) = -A(s)w(s; x(-), a), (5.47) 
with initial data w(0;x(-),<r) := V/io(x(cr)). 

This representation in a slightly different setting already appeared in Proposition 2.2 of [16] 
(see also Proposition 3.1 in [53]), which was a probabilistic formulation of the idea of Helffer and 
Sjostrand [55] and Naddaf and Spencer [69]. The proof relies on taking the gradient of the equation 
dth = J£ w h. A direct computation of the commutator [V,«Sf w ] yields that 

d t v(t, x) = JSfvC*, x) - A(x)v(t, x), (5.48) 

with initial condition vo(x) = v(0, x) = V/io(x). Here A(x) = B(x) + W(x), where B(x) is the 
operator given by the matrix Bj^ in (|5.45p and W(x) is the diagonal multiplication operator by 
[Vy -]" (xj). Since Jzf w generates the process x(t), we can represent the solution to (|5.48j) by the 
Feynman-Kac formula which can be written in the form (|5,46p . 

When applying this proposition to our case, we will choose the initial condition ho be given by 
(I5.36p . The initial condition for the random walk (|5.47p is given by V/to- Notice that the leading 
term in djho(x) = V y (xj) — [V y ] (xj) cancel; this is because the leading term in (]5.8p depends only 
on the density g(y) which is matched for y and y by J y = J y , see (|5.7p . We thus have 

\dM*)\ < (5-49) 
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i.e. initially w is small away from the boundary and for the small a regime the inner integral in 
the r.h.s. of (|5.46p is small. After very long time w becomes constant, but then the right hand side 
of (|5.46|) is zero. The analysis of (|5.46j) requires to monitor what happens to w for coordinates p 
away from the boundary at intermediate times. 

In the following sections we make a few preparations that exclude irrelevant regimes. First, it 
is easy to see that the regular spacing of y, y £ TZ implies that Wj(s) ^ cK~ l , which means that 
the L 1 -norm of the solution to (|5.47p decays at a rate of order K. Thus the integral in (|5.46p can 
be truncated at a ^ CK log K. 



5.3.2 Preparation for the De Giorgi-Nash-Moser bound: Restriction to the good paths 

The representation (|5.46p expresses the covariance function in terms of the discrete spatial derivative 
of the solution to (|5.47p . To estimate w p (a, x(-); a) — w p+ i(a,x.(-);a) in (|5.46p . we will now study 
the Holder continuity of the solution w(s,x(-); a) to (I5.47P at time s = a and at the spatial point 
p. We will do it for each fixed path x(-), with the exception of a set of "bad" paths that will have 
a small probability. 

Notice that if all points Xj were approximately regularly spaced in the interval J, then the 
operator B had a kernel Bij ~ (i — j)~ 2 , i.e. it were essentially a discrete version of the operator 
\p\ = v 7 — A (in one dimension). Holder continuity will thus be the consequence of the De Giorgi- 
Nash-Moser bound for the parabolic equation (|5.47p . However, we need to control the coefficients 
in this equation, which depend on the random walk x(-). 

For the De Giorgi-Nash-Moser theory we need both upper and lower bounds on the time depen- 
dent kernel Bij(s). The rigidity bound (|5.33p guarantees a lower bound on B^, up to a factor K~ c ^. 
Since the sub exponential probabilistic estimate in (15.33H is very strong, one can easily guarantee a 
very similar estimate uniformly in time, i.e. 

F"{x(s) : sup sup \xj(s) - ctj\ < K c ^\ ^ 1 - e~ K<> (5.50) 

O^s^CKlogK \j\^K > 

(maybe after reducing from (|5.33p ). This follows from the fact that u is invariant under the 
dynamics and x(t) has some stochastic continuity. 

The level repulsion estimate implies certain upper bounds on Bij, but these estimates not 
particularly strong. Even in the j3 > 1 case, the bound (|5.20p implies only that 

E"B iit+1 (s) = < K c * 

(X i+ 1 - Xi) 1 

is finite. In the (3 = 1 borderline case even the expectation of ^4,4+1 is infinite. Such a weak control 
does not allow us to guarantee an effective simultaneous bound on Ba+i for all i and for all time. 
Instead of supremum bounds, we control these coefficients only in an average sense and we can 
show that for any fixed index Z £ I, time s and parameter M, we have 

P"{x( S ) : -i- fda-|- £ Y, B ^- a ))^ KP }> l - KCi ~ P - ( 5 - 51 ) 

S \i-Z\^M j 

Here p will be chosen as large constant times £. The summation over j is harmless since for 
\i — j\ ^ the rigidity estimate can be used to bound Bij. By a dyadic choice of the parameters 
s,M, it is easy to upgrade (|5.5ip to hold for any M ^ K and s ^ CK log K. But it is essential 
that a reference point Z be fixed, one cannot guarantee that none of the gaps closes. 

The expectation over the paths, j E x [ • ]w(dx), in (|5.46|) will be restricted to the sets given in 
(|5.50p and (|5.5ip . Due to the strong subexponential bound, the restriction to the set in (|5,50p 
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is unproblematic. However, the estimate (|5.5ip is quite weak; the probability of the "bad" paths 
is bounded only by a small negative power of K. This is not sufficient to compensate the time 
integration in (|5.46j) even after the upper cutoff a ^ CKlogK. We will need to use that the heat 
kernel of the equation (|5.47p has an L 1 — > L°° decay of order 1/s after time s. Thus the solution 
w p (a, x(-); o") decays as 1/a which renders the d<r integration in (|5.46p harmless. 

For completeness, we state the LP — > L q heat kernel decay estimate in a general form. Notice 
that we only assume a lower bound in Bij to guarantee sufficient ellipticity; there is no upper bound 
required for these bounds. 

Proposition 5.7. Consider the evolution equation 

d s u(s) = -A(s)u(s), u(s) G R K (5.52) 
and fix a > 0. Suppose that for some constant b we have 

Bjk(s) > (j _ 6 fc)2 > O^s^a, j±k, (5.53) 

and 

Wj(s) ^ dj := -K\ + 1, O^s^cr. (5.54) 
dj 

Then for any l^p^g^oo we have the decay estimate 

\\u(s)\\ q ^(sby { ^\\u(0)\\ p , 0<s^a. (5.55) 

The proof relies on the usual Nash argument and uses the following critical Gagliardo-Nirenberg- 
type inequality for the discrete version of the operator y 7 — A: 

Proposition 5.8. There exists a positive constant C such that 

ll/HW) < C\\f\\l m £ (5.56) 

holds for any function f : Z — > R. 

The continuous version of this inequality, ||</>||4 ^ C||<^|||(^>, \p\<fr), was first proven in [TO] . 

5.3.3 Preparation for the De Giorgi-Nash-Moser bound: Finite speed of propagation 

The Holder continuity of the parabolic equation (|5.47p emerges only after a certain time, thus for 
the small a regime in the integral (|5.46p we need a different argument. Since we are interested in the 
Holder continuity around the middle of the interval / (note that \p\ K 1- ^ in Theorem I5.5p . and 
the initial condition V/io is small in this region (see (|5.49p ). a finite speed of propagation estimate 
guarantees that w p (a; x(0), a) is small if a is not too large. 

Since (|5.47p is linear, for the finite speed of propagation it is sufficient to consider the funda- 
mental solution. For a fixed, let u a (s) denote the solution 

d s u a (s) = -A(s)u a (s), u?(0) = 5 aj . (5.57) 

with a delta function as initial data. We will assume that the coefficients of A satisfy, for some 
fixed \Z\ ^ K/2 and p > 0, the bound 

sup sup -±-['±- V V B ij (a-a)da^CKP. (5.58) 



iel : \i-Z\^M jel : \j-Z\^M 



65 



Notice that (|5.58p is satisfied on the set of good path given by (|5.50p and (|5.5ip . The following 
lemma provides a finite speed of propagation estimate for the equation (|5.57p under the condition 
of (|5.58|) . This estimate is not optimal, but it is sufficient for our purpose. 

Lemma 5.9. [Finite Speed of Propagation Estimate] Fix a € I and a ^ CK log K. We assume that 
the coefficients of A satisfy (|5.53p and (|5,54p with b = . Assume that (|5.58p is satisfied for 
some fixed Z , \Z\ ^ K/2. Then for the fundamental solution (|5.57p we have the estimate for any 
s ^ a and p £ I 

K(s)\ < ™— p±i. (5.59) 

\p — a \ 

For the proof, we split the operator A = S + 1Z into a short range and a long range part, where 
the short range part S(s) is defined by 

(S(s)vk := - ^ %00(^ - Vj) + Wj(s) Vj (5.60) 

k : \j-k\^i 

with some cutoff parameter £. The norm of the long range part in any LP is bounded by £ _1 
and it is treated as a perturbation via Duhamel formula. For the short range part, we control 
the exponentially weighted norm of the solution of d s r(s) = — 5(s)r(s), i.e. we derive a Gronwall 
bound for 

f(s) = Y,e ]j - a]/9 r](s). 

The result is 

C9~ 2 f f £ B k3 (s')ds' 

The exponent is estimated by (I5.58P with M = K. The optimization of the lengthscale 9 together 
with the cutoff parameter (. yields (|5,59p . □ 

Inserting the estimate (I5.59P into (|5.46p and using the estimate (15.49h on the initial data V/io, 
we obtain that the contribution of the short time regime, a ^ K 1 / 4 , is negligible if p is away from 
edge, \p\ ^ K 1 "^* for some £* > 0. This allows us to disregard the a ^ K 1 ^ regime in (|5.46p and 
focus on a e [if 1/4 , CK log K). 

5.3.4 A discrete De Giorgi-Nash-Moser bound 

We will now treat the main part of the integral (|5.46p by parabolic regularity. The prepa- 
rations in the previous sections ensure that is is sufficient to consider the integration regime 
a E [K 1 / 4 , CK log K] and we can assume that the path x(-) is good in the sense of the esti- 
mates (|5.50p and (|5.5ip . In particular, the rigidity estimate implies not only lower bounds but also 
upper bounds for distant indices; more precisely we have 



f(s) < exp 



/(0). 



c_ 

for any \i — j\ ^ CK^ and ^ s ^ CKlogK; and similarly 



%( s ) < 77—^2 ( 5 - 61 ) 



K^ nt 

Wi(s)^—, if d t >K c ^. (5.62) 

The following regularity theorem combined with (|5.49p completes the estimate of (j5.46j) and com- 
pletes the proof of Theorem 15.51 □ 
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Theorem 5.10 (Parabolic partial regularity with singular coefficients). Let u be a solution to (|5.57p . 
where u = u a for any choice of a. Suppose that the coefficients of A satisfy the lower bounds (|5.53[) 
and (|5.54p with b = K~^, the upper bounds (|5.6ip . (|5.6'2|) for distant indices and the upper bound 
in (|5,58p in average sense for all indices. Let a G \K C1 ,C\K \ogK] be fixed, where c\ > is an 
arbitrary positive constant. Then for any < q' < 1 there exists q > so that for any \Z\ ^ K/2 

sup \ u j(°~) ~ u j'{°~)\ ^ Co~ 1 ~ q i (5.63) 

m^{\j-Z\,\j'-Z\)^-i' 

where u = u a for any choice of a. 

Notice that this result is deterministic, all probability estimates are comprised in verifying the 
conditions. We also remark that if we define the rescaled function v(j/K,t) := tuj(t), then (|5.63|) 
can be interpreted as a type of Holder regularity of v on scale a 1 ~ q K^ 1 <C 1 at the point Z/K: 

\v(x, a) - v(y, a)\ ^ a~ q ^ \x - y\ Ciq (5.64) 

for l/K ^ \x — y\ ^ a l ~ q ' /K and x,y near Z/K. The Holder exponent is thus at least C\q. 

Although the statement of Theorem 15.101 seems to be complicated, the underlying mechanism 
is that there is a positive exponent q in (|5.63p . which to a great degree is an universal constant. 
This exponent provides an extra smallness factor in addition to the natural size of Uj(o~), which is 
cr _1 from the L 1 — > L°° decay. As (|5.64p indicates, this gain comes from a Holder regularity on the 
relevant scale. 

Our equation (I5.57|) is of the type considered in [13], but it is discrete and in a finite interval. 
The key difference, however, is that the coefficient B^ = (xi — Xj)~ 2 in the elliptic part of (|5.57p can 
be singular if gaps close, even temporarily, while [14] assumed the uniform bound Bij ^ C /\i — j\ 2 . 
The only control we have for the singular behavior of By is the estimate (|5.58p which is very weak. 
This estimate essentially says that the space-time maximum function of £>j.j+i(i) at a fixed space- 
time point (Z, 0) is bounded by K p . Our main task is to show that this condition is sufficient 
for proving Holder continuity at the same point. Our strategy follows the approach of Caffarelli- 
Chan-Vasseur [13]. The main new feature of our argument is the derivation of a local energy 
dissipation estimate for parabolic equation with singular coefficients satisfying (|5.53p . (|5.54p as 
lower bounds and only (I5.58P as an upper bound. The analogous result in [13], called the first De 
Giorgi lemma, is proved under uniform bounds on the coefficients. For our proof, roughly, we have 
to run the argument of the first De Giorgi lemma twice; first we get a bound only in L 2 (Z) then 
using this information we upgrade it to an L°°(Z) bound. This concludes the sketch of the proof 
of Theorem 15.101 □ 



5.4 From local measures to Wigner matrices and /3-ensembles 

Given Theorem 1 5. 1| the proofs of Theorem ll.3l and ll.5l follow relatively standard ideas from previous 
results, some of them were reviewed in Sections [3] and [31 The key inputs are to verify the condition 
(|5.13p and to ensure that the configuration intervals coincide (|5.1ip . 

For the /3-ensemble, (I5.13P simply follows from conditioning the global rigidity estimate in 
Theorem 14.11 For matching the configuration intervals, first we match the local density by scaling 
and translation that guarantees that |J y | ~ \Jy\i see (15.12p . Then, with a second scaling, we 
fine tune the slight discrepancy between the lengths of J y and Jy. This finishes the proof of 
Theorem 11.51 □ 

In the Wigner case, we always work on the same configuration interval, so matching of J is 
automatic. The proof of (|5.13p . however, requires a bit more effort than for the /3-ensemble, but 
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it will relatively easily follow from other information we already collected along the three step 
strategy described in Section 11.51 As we explained in the proof of Theorem II. 2\ the averaging over 
the energy was really needed only in the second step, where the closeness of the local statistics of 
ft/J, and \x was shown for small t, where ft is the evolution of the DBM. As a byproduct of this 
step, we obtain bounds on the global entropy and Dirichlet form, see (|3.38|) . In particular, the local 
Dirichlet form w.r.t /i y can be estimated by the global one, which then can be used to compare 
expectations w.r.t. the conditional measures ft,y^y and [i y ; 

E /i.yMy ( x ) _ E^O(x). (5.65) 

We are especially interested in controlling the difference 

\E ft ^Xj - W*Xj\ ^ CK^N~ l . (5.66) 

Since W ttl Xj is close to its classical location jj by rigidity (|5.16p for Wigner matrices, after con- 
ditioning, we obtain that W t ' y ^ ly Xj is also close to jj, at least for most y w.r.t. /t/x. Combining 
this information with (|5.66p yields (|5. 13[) . Therefore Theorem 15.11 applies and we will use it for a 
Gaussian case, V(x) = V(x) = x 2 /2 but with two different boundary conditions yj e I. Since 
this holds for most y w.r.t the measure fj,, it also holds for \i itself, i.e. the gap statistics of \i y and 
\i coincide. On the other hand, the estimate (|5.65[) applied to the observable 0(xj — Xj+i) implies 
directly that the single gap distribution w.r.t. ft,yfJy and /i y coincide for most of the y w.r.t. 
ftfjL. Finally, the gap statistics of /t, y /i y and ftfJ, coincide for most y w.r.t. ftfi by conditioning. 
Putting these relations together we obtain that the gap statistics of ft/J, and fi coincide, i.e. the 
local measures, that played an important auxiliary role, are eliminated. 

Finally, the small Gaussian component present in ftfi for small but non-zero t can be removed 
by the Green function comparison theorem, Theorem 13.61 Although the direct application of the 
Green functions give information only on eigenvalues around a fixed energy and not on an eigenvalue 
with a fixed label, the estimates are strong enough to transfer fixed energy information to fixed 
label. The main reason for this flexibility is that Theorem 13.61 allows for very small r\ ~ N^ 1 ^ 6 , 
i.e. well below the typical spacing. Indeed, Theorem 1.10 from [58] implies that if the first four 
moments of two generalized Wigner ensembles, H v and i? w , are the same, then we have 

lim [E v -E w ]0(N(x j -x j+1 ),N(x j -x j+2 ),...,N(x j -x j+n )) =0. (5.67) 

Roughly speaking, the proof of ()5.67j) in |58| was based on Theorem 13.61 In order to convert fixed 
energy to a fixed eigenvalue index, one needs to know that the total number of eigenvalues up to 
a fixed energy is the same for the two ensembles. The total number of eigenvalues up to a fixed 
energy E can be expressed in terms of integration of imaginary part of the trace of Green functions, 
i.e., 

[ E Im Tr 1 

J-oo H-(y + irj) 

with an r] slightly smaller than 1/N. Thus the basic idea of the Green function comparison theorem 
can be employed and this leads to (15.67H . This completes the proof of Theorem 11.31 □ 
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